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Charles Jsaac Hosier 
1910-1951 


The death of Charles Mosier after a very short illness is a great 
loss to his friends, and a severe blow to work in the field of psycho- 
metrics. He was taken ill at his office on Monday, January 15, and 
died in the hospital the next afternoon with a diagnosis of meningitis. 

As Chief of Research and Analysis in the Personnel Research 
Section, The Adjutant General’s Office, he held one of the key posi- 
tions for directing military research in this country; he strove to 
make the numerous projects under his direction contribute both to 
the development of valuable military tools, and to the furtherance 
of psychology. 

He was born in Miami, Florida on June 11, 1910, and completed 
his college work at the University of Florida in 1932. The same year 
he was awarded an 8. S. R. C. Fellowship and began his graduate 
work in psychology at the University of Chicago. 

In the summer of 1933, Mary F. Fortis and he were married. 
Their daughter, Mary Fortis, was born a year later. He interrupted 
his graduate work to accept a position as Instructor in Psychology 
at the University of Florida in 1933. During the next four years he 
continued as instructor at Florida, and completed his work for the 
Ph.D. at Chicago. 

His doctoral thesis was on a multiple-factor analysis of neurotic 
symptoms. This was one of the early studies applying the methods 
of factor analysis to items, in an attempt to discover the structure 
of a non-cognitive domain. From 1927 to 1941 Dr. Mosier continued 
his work as assistant professor and as a staff member of the Examin- 
er’s Office at the University of Florida. His papers dealt with vari- 
ous factor problems, such as the effects of random error, and the de- 
velopment of new methods of rotation. During this time he published 
his articles on the duality of psychophysics and test theory, which 
mark a new and very intriguing approach to problems in both fields. 

In 1941 he went to Washington, D. C. to work for the State 
Technical Advisory Service, Social Security Board, first as a Research 
Psychologist, and later as Chief of Methods and Analysis and Chief of 
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2 CHARLES ISAAC MOSIER 


Research and Test Construction. In these positions he was concerned 
with the development and validation of tests used for the selection 
of employees working for various federal and state agencies which 
administered the social security program. 

In 1946 he moved to the Civilian Personnel Division of the Office 
of the Secretary of War and in 1947 to the Personnel Research Sec- 
tion of the AGO. Since then, as Chief, Research and Analysis, he 
directed a staff of psychologists and assistants in the Pentagon, and 
has also aided in guiding the numerous outside psychological research 
projects financed by the AGO. It is fortunate that a person of his abil- 
ity and research acumen was available for a key position such as this. 
He was one of the founders of the journal Personnel Psychology, and 
was on the editorial boards of Psychometrika and Educational and 
Psychological Measurement. However, his attention to directive work 
of this sort was a loss in that it meant that he had less time to devote 
to his own original contributions to psychological theory—in particu- 
lar, his important treatment of the duality of test theory and psycho- 
physics. . 

His many friends extend their sympathy to his wife and daugh- 
ter. They will remember Charlie Mosier not only as a contributor to 
psychology, but also as a cheerful and energetic companion and work- 
er who carried more than his share in any enterprise. 

HAROLD GULLIKSEN 

Princeton University 
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REMARKS ON THE METHOD OF PAIRED COMPARISONS: 
I. THE LEAST SQUARES SOLUTION ASSUMING 
EQUAL STANDARD DEVIATIONS 
AND EQUAL CORRELATIONS* 


FREDERICK MOSTELLER 
HARVARD UNIVERSITY 


Thurstone’s*Case V of the method of paired comparisons as- 
sumes equal standard deviations of sensations corresponding to 
stimuli and zero correlations between pairs of stimuli sensations. 
It is shown that the assumption of zero correlations can be relaxed 
to an assumption of equal correlations between pairs with no change 
in method. Further the usual approach to the method of paired com- 
parisons Case V is shown to lead to a least squares estimate of the 
stimulus positions on the sensation scale. 


1. Introduction. The fundamental notions underlying Thur- 
stone’s method of paired comparisons (4) are these: 

(1) There is a set of stimuli which can be located on a sub- 

jective continuum (a sensation scale, usually not having a meas- 

urable physical characteristic). 

(2) Each stimulus when presented to an individual gives rise 

to a sensation in the individual. 

(3) The distribution of sensations from a particular stimulus 

for a population of individuals is normal. 

(4) Stimuli are presented in pairs to an individual, thus giv- 

ing rise to a sensation for each stimulus. The individual com- 

pares these sensations and reports which is greater. 

(5) It is possible for these paired sensations to be correlated. 


(6) Our task is to space the stimuli (the sensation means), ex- 
cept for a linear transformation. 


*This research was performed in the Laboratory of Social Relations under 
a grant made available to Harvard University by the RAND Corporation under 
the Department of the Air Force, Project RAND. 
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There are numerous variations of the basic materials used in 
the analysis—for example, we may not have n different individuals, 
but only one individual who makes all comparisons several times; or 
several individuals may make all comparisons several times; the in- 
dividuals need not be people. 

Furthermore, there are “cases” to be discussed—for example, 
shall we assume all the intercorrelations equal, or shall we assume 
them zero? Shall we assume the standard deviations of the sensa- 
tion distributions equal or not? 

The case which has been discussed most fully is known as Thur- 
stone’s Case V. Thurstone has assumed in this case that the stand- 
ard deviations of the sensation distributions are equal and that the 
correlations between pairs of stimulus sensations are zero. We shall 
discuss a standard method of ordering the stimuli for this Case V. 
Case V has been employed quite frequently and seems to fit empirical 
data rather well in the sense of reproducing the original proportions 
of the paired comparison table. The assumption of equal standard 
deviations is a reasonable first approximation. We will not stick to 
the assumption of zero correlations, because this does not seem to be 
essential for Case V. 

2. Ordering Stimuli with Error-Free Data. We assume there 
are a number of objects or stimuli, O,, O., --- , O,. These stimuli 
give rise to sensations which lie on a single sensation continuum S. 
If X; and X; are single sensations evoked in an individual I by the 
ith and jth stimuli, then we assume X; and X; to be jointly normally 
distributed for the population of individuals with 


mean of X;= S; (t=1,2,---,n) 
variance of X; = 0? (X;) =o (t=1,2,---,n) (1) 
correlation of X;andX;—pij—p (t,j=1,2,---,m). 


The marginal distributions of the X,’s appear as in Figure 1. 
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FIGURE 1 


The Marginal Distributions of the Sensations Produced by the Separate 
Stimuli in Thurstone’s Case V of the Method of Paired Comparisons. 
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The figure indicates the possibility that X.< X,, even though S, < S.. 
In fact this has to happen part of the time if we are to build any- 
thing more than a rank-order scale. 

An individual I compares O; and O; and reports whether 
X; 2 X; (no ties are allowed). 

We can best see the tenor of the method for ordering the stimuli 
if we first work through the problem in the case of nontallible data. 
For the case of nontallible data we assume we know the true propor- 
tion of the time X; exceeds X;, and that the conditions given above 
(1) are exactly fulfilled. 

Our problem is to find the spacing of the stimuli (or the spacing 
of the mean sensations produced by them, the S, --- S, points in Fig- 
ure 1). Clearly we cannot hope to do this except within a linear 
transformation, for the data reported are merely the percentages of 
times X; exceeds X;, Say pij;. 


1 i] — [dis — (Si— 8) T° 
Diy = P(X, > X;) =————- e ddi; (2) 
Vino(ds,) o, 2 0° (dis) 


where di; = X; — X;,; and o* (di;) = 2e7(1 — p). There will be no 
loss in generality in assigning the scale factor so that 





2o*(1—p) =1. (3) 


It is at this point that we depart slightly from Thurstone, who char- 
acterized Case V as having equal variances and zero correlations. 
However, his derivations only assume the correlations are zero ex- 
plicitly (and artificially), but are carried through implicitly with 
equal correlations (not necessarily zero). Actually this is a great 
easing of conditions. We can readily imagine a set of attitudinal 
items on the same continuum correlated .34, .38, .42, i.e., nearly 
equal. But it is difficult to imagine them all correlated zero with one 
another. Past uses of this method have all benefited from the fact 
that items were not really assumed to be uncorrelated. It was only 
stated that the model assumed the items were uncorrelated, but the 
model was unable to take cognizance of the statement. Guttman (2) 
has noticed this independently. 


With the scale factor chosen in equation (3), we can rewrite 
equation (2) 
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1 eo 
y= =a ew dy. (4) 
% -(Sj-Sj) 


From (4), given any pi; we can solve for —(S;—S;) by use of a 
normal table of areas. 'vhen if we arbitrarily assign as a location 
parameter S, = 0, we can compute all other S;. Thus given the pi; 
matrix we can find the S;. The problem with fallible data is more 
complicated. 

3. Paired Comparison Scaling with Fallible Data. When we 
have fallible data, we have p’;; which are estimates of the true pi;. 
Analogous to equation (4) we have 


1 a) 
-[ ev? dy ‘ (5) 
-D’ ij 


where the D’;; are estimates of D;; = S;—S;. We merely look up the 
normal deviate corresponding to p’;; to get the matrix of D’;;. We 
notice further that the D’;; need not be consistent in the sense that 
the D;; were; i.e., 
Di; = fs Dy=S;— S; +S;—S,=Di, 

does not hold for the D’;;. 

We conceive the problem as follows: from the D’;; to construct 
a set of estimates of the S;’s called S’;, such that 

>=) [(D'ij— (S'i—S';)]? isto bea minimum. (6) 
1,5) 

It will help to indicate another form of solution for nonfallible 

data. One can set up the S; — S; matrix: 


MATRIX OF S; — S; 
EE 
S,—S2 S:—S3 


S2—S2 S.—S3 
S;— S2 S,;—S; 





n S,— 8, Sa — S2 S,i— Ss 





Totals SS:—nS, SS:—nS, >S;— nS; 
Means gs S, S S 
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Now by setting S, = 0, we get S. = (S — S,) — (S— S.), Ss: = 
(S — Si) — (S— S;), and so on. We will use this plan shortly for 
the S’;. 

If we wish to minimize expression (6) we take the partial de- 
rivative with respect to S’;. Since D’;; = —D’;, and S'; — Sj = 
—(S'; — S':) and D';; = S'; — S'; = 0, we need only concern our- 
selves with the sum of squares from above the main diagonal in the 
D';; — (S'; — S';) matrix, ie., terms for which i < 7. Differentiat- 
ing with respect to S’; we get: 


r,) 2 i--1 n 
(>/2) =2 Aen Sy HS) — ZO Sit | (7) 
oS’; 


j=l j=i+1 


((=1,2,---,n). 


Setting this partial derivative equal to zero we have 
+S’; + Se Bae +8’ 4 eo (n—1)S'; = Sin a0 <a +35 
i-1 n ' 
= TD ji— 3 Di; ((=1,2,---,n), 
j=1 j=it1 
but D’;; = —D’;, , and D’;; = 0; this makes the right side of (8) 
i-1 n n 
LDit+ Dat F DpAHTD yj. 
j=l jziti j= 


Thus (8) can be written 


2 Sj — S's => Dj (t=1,2,---,m). (9) 
j=1 


j=1 


The determinant of the coefficients of the left side of (9) van- 
ishes. This is to be expected because we have only chosen our scale 
and have not assigned a location parameter. There are various ways 
to assign this location parameter, for example, by setting S’ = 0 or 
by setting S’, = 0. We choose to set S’; = 0. This means we will 
measure distances from S’,. Then we try the solution (10) which is 
suggested by the similarity of the left side of (9) to the total col- 
umn in the matrix of S; — S;. 


Ss’; = D'j;,/n—> D'ji/n. (10) 


j j=1 
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Notice that when i—1, S’; = 0 and that 


LSiH= TD in 

i=1 i=1 
because 

LaUD =, 

ae 
which happens because every term and its negative appear in this 
double sum. Therefore, substituting (10) in the left side of (9) we 
have 


EDin—n| ED',/n—SD'j,/n | =SDp, (11) 


i=1 j=l j=1 J j=1 


which is an identity, and the equations are solved. Of course, any 
linear transformation of the solutions is equally satisfactory. 

The point of this presentation is to provide a background for 
the theory of paired comparisons, to indicate that the assumption of 
zero correlations is unnecessary, and to show that the customary 
solution to paired comparisons is a least squares solution in the 
sense of condition (6). That this is a least squares solution seems 
not to be mentioned in the literature although it may have been 
known to Horst (3), since he worked closely along these lines. 

This least squares solution is not entirely satisfactory because 
the p’;; tend to zero and unity when extreme stimuli are compared. 
This introduces unsatisfactorily large numbers in the D’;; table. This 
difficulty is usually met by excluding all numbers beyond, say, 2.0 
from the table. After a preliminary arrangement of columns so that 
the S’; will be in approximately proper order, the quantity 


> (D'ij — D'i,jn) /k 


is computed where the summation is over the k values of i for which 
entries appear in both column 7 and 7+1. Then differences between 
such means are taken as the scale separations (see for example Guil- 
ford’s discussion (1) of the method of paired comparisons). This 
method seems to give reasonable results. The computations for meth- 
ods which take account of the differing variabilities of the p’;; and 
therefore of the D’;; seem to be unmercifully extensive. 

It should also be remarked that this solution is not entirely a 
reasonable one because we really want to check our results against 
the original p’;;. In other words, a more reasonable solution might 
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be one such that once the S’; are computed we can estimate the 9’;; 
by p”;;, and minimize, say, 


X= (p'i5 — 9" is)? 
or perhaps 
> (are sin pi; — arc sin V/p"i;)?. 
Such a thing can no doubt be done, but the results of the author’s 


attempts do not seem to differ enough from the results of the present 
method to he worth pursuing. 
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THE DIMENSIONS OF TEMPERAMENT* 


L. L. THURSTONE 
THE UNIVERSITY OF CHICAGO 


The correlations among the thirteen personality scores yielded 
by the Guilford schedule for factors STDCR, and the Guilford-Mar- 
tin schedules for factors GAMIN, and O, Ag, and Co, as reported 
by Lovell, were factored by the centroid method. The purpose was 
to see how many factors were represented by the thirteen scores; 
therefore the test reliabilities were used in the diagonal cells. It 
was found that the scores represent not more than nine linearly 
independent factors. The orthogonal factor matrix was rotated to 
oblique simple structure. Seven of the oblique factors were given 
tentative interpretation. Two factors were regarded 2s residual 
factors because of the small variance which they represent. The 
seven factors have been named Active, Vigorous, Impulsive, Domi- 
nant, Stable, Sociable, and Reflective. 


The purpose of this study was to determine the number of factors 
or dimensions that are implied in current personality schedules, and 
also to ascertain the nature of each factor or type. The several sched- 
ules of Guilford were chosen for this purpose because they represent 
careful analytical work. Each of his schedules has previously been 
analyzed factorially, and correlations have been determined between 
the separate scores for his schedules. 

The various personality schedules cover a wide range of personal 
characteristics, including those which are relatively permanent for 
each person as well as those which change more or less from one year 
to the next because of social experience. Most of the scores derived 
from the Guilford schedules represent relatively permanent character- 
istics of a person which may be called temperamental traits. Some 
personality scores, such as appraisals of attitudes on controversial so- 
cial questions, represent only partly the temperamental characteristics 
of a person. Such scores also reflect his recent social experience, his 
social identifications, and the propaganda to which he may have been 
exposed. They are less stable as indicators of temperamental types. 
Our interest here is in those non-intellective traits of personality 
which are relatively stable, the temperamental types, and which are 


*This study was supported in part by a research grant from Sears Roebuck 
and Company. The writer wishes to acknowledge in particular the interest and 
assistance of Mr. J. C. Worthy of the National Personnel Department at Sears 
Roebuck and Company. The writer also wishes to acknowledge the assistance 
of Mr. James Degan who was responsible for the computing in this study. 
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not often markedly changed in social experience. Hence we refer to 
this problem as the dimensions of temperament rather than the much 
larger domain that is called personality. 

Guilford has produced three personality schedules that were used 
in the present study. These were Guilford’s schedule for the scores 
STDCR, the Guilford-Martin schedule for the scores GAMIN, and the 
Guilford-Martin schedule for the scores O, Ag, and Co.* Each of the 
first two schedules gives five scores, and the third schedule gives three 
scores. Hence the schedules give thirteen separate scores, all of 
which were used in the present study. 

The correlations among the thirteen scores were reported recently 
by Lovell who gave all three schedules to 213 subjects.+ She made a 
factor analysis of the thirteen scores in which the communalities were 
determined by their intercorrelations. This is the usual procedure, but 
in the present case it should be recalled that the thirteen scores were 
themselves determined as factor scores from the original question- 
naires that contained many hundreds of items. Hence the procedure of 
Lovell was essentially to investigate the Second-order domain in the 
thirteen factor scores. This is an interesting and important problem. 
The second-order domain in the traits of temperament may be psycho- 
logically revealing. But before undertaking such a study, it would be 
preferable to make sure that the factor scores which enter into a sec- 
ond-order analysis are linearly independent. Lovell questions the 
linear independence of the thirteen scores in her opening statement. 
She says: “The original studies showed that the thirteen factors were 
not completely independent of each other though they were sufficiently 
separate to make individual scores helpful.” Test scores may be very 
useful even though they are not linearly independent, but such a situ- 
ation introduces reservations about a second-order analysis. 

In the present study we direct ourselves first to the main problem, 
namely, to determine the number of dimensions or factors in these 
personality schedules which are represented by thirteen separate 
scores. This is the same problem that Lovell mentions in introducing 
her study. Instead of dealing with the thirteen scores as variables 
whose common factors are to be ascertained, we want to know how 
many factors are represented in the thirteen scores. For this purpose 
we make the factorial analysis with the test reliabilities in the diagon- 


*Guilford, J. P., and Guilford, R. B. Personality factors, S, E, and M, and 
their measurement. J. Psychol., 19386, 2, 107-127: Personality factors, D, R, 
T, and A. J. abnorm. soc. Psychol., 1939. 34, 21-36; Personality factors N and 
GD. J. abnorm. soc. Psychol., 1939, 34, 239-248. 

Lovell, Constance. A study of factor structure of thirteen personality vari- 
ables. Educ. psychol. Meas., 1945, 5, 335-350. 
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al cells. If a second-order analysis is to be made of these thirteen scores, 
then the common factor variances, the communalities, are recorded in 
the diagonal cells as was done in Lovell’s paper. 

The thirteen scores from the Guilford Schedules are listed in 
Table 1. Each trait is shown by Guilford’s name for the trait and by 
his code symbol. Then follow some items indicative of the presence of 
the trait (positive items) and some items that indicate the absence of 
the trait (negative items). Then follow further sample items from the 
schedules. Most of Guilford’s scores are defined by a mixture of posi- 
tive and negative items. A few of the scores have a preponderance of 
negative items in which a subject gets a high score in a trait by ac- 
knowledging the opposite trait. For bipolar traits, this:is legitimate, 
especially when both directions are well represented by questions. 
When only one of the two poles is well defined by questions, it seems 
preferable to let the well defined pole carry the trait name even if it 
is not regarded as the socially preferable side of the bipolarity. We 
have given here Guilford’s trait names and the direction of each bi- 
polarity to which he assigns the numerically higher scores. When he 
plots a profile of percentile norms, he reverses some of the scores so 
as to represent the socially more desirable end of the bipolarity with 
the numerically higher percentile ranks. 

In some of the schedules, there is a good balance between “yes” 
and ‘“‘no” answers that indicate presence of the trait, but in several 
of the schedules there is a large majority of one type of answer for 
the high scores. The Cycloid score is determined from 68 items that 
are positively scored with “yes” answers while there are only five 
items with “no” answers for the same trait. The corresponding ra- 
tios for other schedules are General Activity 21 and 3, Nervousness 
5 and 38, Objectivity 2 and 45, Agreeableness 2 and 36, Cooperative- 
ness 4 and 56. Some of these traits seem to be more easily described 
by the socially less desirable side of the bipolarity. 

The correlations reported by Lovell are reproduced in our Table 2. 
In the diagonal cells of this correlation matrix we have recorded the 
test reliabilities which are also reported by Lovell.* The question is 


*The writer agrees with a reservation that has been made by one of the 
reviewers of this paper, but it probably does not invalidate an approximate de- 
Hea of the dimensionality of the thirteen scores. The reservation is as 
ollows: ’ 

The author might well mention two conditions that have important bearings 
on his analysis. The reliabilities reported by Lovell were taken from the test 
manuals and were therefore not based upon the same population as the inter- 
correlations. Accuracy of these values would be very important in establishing 
the dimensionality of the factors. Many of the intercorrelations are spuriously 
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now to determine the rank of this correlation matrix. The matrix was 
factored by one of the centroid methods} and the result is shown in our 
Table 3. In making this factorial reduction, we did not adjust the diag- 
onal values because our object was to find the number of dimensions 
in the test scores and this is not necessarily the rank of the reduced 
correlation matrix. This objective excludes the error variance in each 
test score but we do not limit ourselves to the factors that may be com- 
mon to the test scores. We want to analyze the dimensionality of the 
test content of the thirteen scores, excluding their error variance. 


The orthogonal centroid factor matrix of Table 3 shows nine fac- 
tors. The distribution of ninth factor residuals is shown in Table 4. We 
now have the answer to our first problem in the result that, for prac- 
tical purposes, the thirteen personality scores represent not more 
than nine factors. Hence the thirteen scores are linearly dependent. 


A good structure was obtained in solving the rotational problems 
for these data. The transformation matrix A is shown in Table 5, by 
which the nine orthogonal centroid axes are replaced by seven oblique 
axes which have been given interpretation and by two residual axes. 
The last two axes are given only tentative interpretation and they may 
be left as residual factors because of the small variance which they 
represent. 

In Table 6 we have the oblique factor matrix for the seven factors 
with interpretation and two residual factors that are denoted X, and 
X., respectively. The thirteen scores from the Guilford schedules are 
represented here in the socially favorable forms in accordance with the 
scoring on Guilford’s profiles and as they are represented in Lovell’s 
correlation matrix. We shall now give tentative interpretation to the 
seven significant primary factors of this V matrix. 


In the first column there are eight zero loadings and five signifi- 
cant loadings. The strongest of these is Thinking Introversion with 
.76. The next strongest saturation is —.41 on Rhathymia.t The three 
less conspicuous loadings are Social Introversion with .29, Depression 
with .35, and Emotional Instability with .26. Inspection of the items 
leads to the generalization that this primary factor can be called in- 
troversion or introspection. A more general descriptive adjective is 





high due to the fact that some items were scored for more than one trait. Such 
scores therefore have common error variance which adds materially to the ap- 
parent common-factor variance. 

tThurstone, L. L. Multiple factor analysis. Chicago: Univ. of Chicago 
Press, 1947. Chap. VIII, pp. 161-170. 

{The signs have been adjusted to agree with the reversal of trait names. 
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Reflective which covers most of the traits in a descriptive sense with- 
out implying socially favorable or unfavorable implications. The pri- 
mary factor is denoted R. 

The second column has only two large significant saturations, 
namely, those for Agreeableness and Cooperativeness with small sat- 
urations for Objectivity, Freedom. from Nervousness, and Rhathymia. 
We have generalized this primary factor in the descriptive adjective 
Sociable with the symbol S. 

The third column has two significant saturations, namely, those 
for Emotional Stability and Freedom from Depression with lower sat- 
urations on Social Extraversion, Thinking Extraversion, and on Free- 
dom from Nervousness. We generalize this primary factor in the des- 
criptive adjective Emotionally Stable, with the symbol E. 

The next column has only one large saturation, namely, that for 
Masculinity with a smaller saturation on Freedom from Nervousness. 
This primary factor can be generalized in the adjective Vigorous with 
the symbol V. 

The next column has two significant saturations on Ascendance 
and Extraversion with no other significant loadings. We have general- 
ized this primary factor in the descriptive adjective Dominant in the 
sense of social leadership with the symbol D. 

The next column has two significant saturations on General Ac- 
tivity and Cooperativeness with slight saturations on Objectivity and 
Ascendance. We have generalized this primary factor in the descrip- 
tive adjective Active with the symbol A. 

The seventh column has only two significant entries, namely, 
those for General Activity and for Rhathymia. A small saturation on 
Freedom from Inferiority Feelings is consistent with the generaliza- 
tion of this primary factor in the descriptive adjective Impulsive with 
the symbol I. 

The residual factor X, might be called Self-confidence and we 
were tempted to denote it C, but the saturations are small with the 
highest loading of .35; and it seemed more appropriate not to include 
it in a list of primary traits until its independence and significance can 
be demonstrated more clearly. The second residual factor X. was also 
left without interpretation since its highest saturation is only .29 on 
Freedom from Nervousness. 

In Table 7 we have the intercorrelations of the seven primary fac- 
tors to which we have attempted to give interpretation. The most con- 
spicuous intercorrelation is that of Impulsiveness and Dominance 
which is .71. These two factors are clearly separated in the oblique 
factor matrix V so that by the present data they are clearly distin- 
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guished. Another significant correlation between primaries is that of 
Sociability and Emotional Stability which is .52. The separation be- 
tween these two primaries is also clear in the oblique factor matrix V. 

The thirteen scores obtained from the several personality sched- 
ules of Guilford represent a dimensionality of not more than nine 
linearly independent factors. Since the variance of two of these factors 
is rather small, the dimensionality of the thirteen scores is not more 
than seven independent factors for practical purposes. The analysis 
was made in terms of the test space from which only the error vari- 
ance was eliminated. Hence the reliability coefficients were used in the 
diagonal cells of the correlation matrix for this analysis. A similar 
analysis with communalities in the diagonals would probably give a 
smaller number of dimensions since it would be limited to those factors 
which are shared by two or more of the thirteen scores. That was not 
the purpose of the present study. 

The seven dimensions of the thirteen scores for which interpreta- 
tion has been attempted were tentatively named Reflective (introspec- 
tive), Impulsive, Sociable, Active, Dominant (leadership), Vigorous, 
and Emotionally Stable. These primary factors were given the sym- 
bols R,I,S,A,D, V,and E, respectively. The simple structure that was 
found in this configuration of seven dimensions was very marked, as 
shown in the large number of vanishing entries in the oblique factorial 
matrix V. The structure can be seen even more clearly on a diagram 
for each pair of columns in which the saturations of one column are 
plotted against the corresponding saturations in the other column. 

This variant interpretation of Guilford’s work on personality fac- 
tors does not deny the existence of many more factors in this domain. 
The thirteen traits that are described and named by Guilford can be 
very useful even though they are not linearly independent. In general, 
there would be preference for a set of descriptive profile categories in 
which each column contributes some information that cannot be ob- 
tained as a weighted score of the other columns. That is, of course, 
what is meant by linear independence. 

We started this analysis with the expectation of finding bipolar 
factors for all or most of these factors; but the result revealed all of 
them to be positive. In naming the factors we tried to avoid those terms 
which refer explicitly to the more abnormal aberrations of tempera- 
ment or personality, such as depression and cycloid disposition. Such 
concepts refer to the psychiatric extremes, but they have correlates in 
terms that refer to the less severe deviations within the normal range 
of temperament. When schedules of this kind are used for the descrip- 
tion of personality among subjects who are in the normal range, it 
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seems preferable to use terms which avoid as far as possible the com- 
parison of a normal subject with the abnormal extremes. This is prob- 
ably good policy in describing the temperaments of normal subjects 
even though it is recognized that there is no sharp demarcation be- 
tween the normal and the abnormal in each of the factors or dimen- 
sions. 


Manuscript received 5/16/50 
Revised manuscript received 10/9/50 


TABLE 1 
Guilford’s Thirteen Scores 


S: Social Introversion 

Positive: Shyness, seclusiveness, tendency to withdraw from social contacts. 

Negative: Sociability, tendency to seek social contacts, to enjoy company of 
others. 

Sample items: Limits acquaintances to a select few, keeps quiet in social groups, 
difficulty in starting conversation with strangers, frequent loneliness, spends 
evenings alone, takes life seriously, bashfulness, lets others take the lead. 


T: Thinking Introversion 

Positive: Inclination to meditative or reflective thinking, philosophizing, analyz- 
ing one’s self and others. 

Negative: Extravertive orientation in thinking. 

Sample items: Analyzes motives of others, ponders over the past, takes life 
seriously, works on complicated problems, often lost in thought, much atten- 
tion to details, often moody, works better when praised. 


D: Depression 

Positive: Habitually gloomy, pessimistic mood, feelings of guilt. 

Negative: Cheerfulness and optimism. 

Sample items: Often moody, self-conscious, daydreams frequently, often wor- 

ries, frequent ups and downs in mood, feelings easily hurt, loneliness, difficulty 
in making decisions, feelings of inferiority, often excited. 


C: Cycloid 

Positive: Strong emotional fluctuations, tendency toward flightiness, emotional 
instability. 

Negative: Uniformity in mood, evenness of disposition. 

Sample items: Moody, acts on the spur of the moment, works better when 
praised, changes work frequently, daydreams, worries, ups and downs in 
mood, feelings easily hurt, impulsive, interests change quickly, lonely, high- 
strung, absent-minded. 


R: Rhathymia 


Positive: Happy-go-lucky, carefree disposition, lively, impulsive. 
Negative: Inhibited, over-controlled, conscientious, serious-minded. 
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TABLE 1 (Continued) 
Guilford’s Thirteen Scores 


Sample items: Carefree, acts on spur of the moment, impulsive, craves excite- 
ment, jumps at conclusions, lively, plays pranks on others, restless. 


G: General Activity 

Positive: General pressure for vigorous activity. 

Sample items: Quick in actions, eats rapidly, walks fast, “on the go,” starts 
work with enthusiasm, hurries, talkative, impulsive, daredevil, group leader. 


A: Ascendance 

Positive: Social Leadership 

Sample items: Easily starts conversation with strangers, good at bluffing, or- 
ganizer, takes social initiative, likes public speaking, takes responsibility, 
takes charge in case of accident, stands up for his rights, a good salesman. 


M: Masculinity 

Positive: Masculinity in emotional and temperamental make-up. 

Sample items: Wants to be physically strong, not afraid of the dark, likes hunt- 
ing, likes to take a chance, not afraid of deep water,.not sorry for underdog, 
not afraid of snakes, preference for mathematics, science, politics, building 
trades, mining, prize fights, rather than literature, music, flowers, art, danc- 


ing. 


I: Inferiority Feelings 

Positive: Lack of confidence, undervaluation of one’s self, feelings of inadequacy. 

Sample items: Often feels thwarted, bossed around too much, often bored, slow 
emotional recovery from emotional upset, awkward, craves encouragement, 
absent-minded, unpopular, easily discouraged, slow in making decisions. 


N: Nervousness 
Positive: Jumpiness, jitteriness, easily distracted, irritated, easily annoyed. 
Negative: Calm, unruffled, relaxed. 


O: Lack of objectivity 

Positive: Takes everything personally, hypersensitive, easily upset, nervous, dis- 
turbed by criticism, readily unburdens his troubles to others, easily offended, 
or annoyed. 


Ag: Lack of Agreeableness 

Positive: Does not like to take instructions from others, feels that most people 
are stupid, hates to lose an argument, dislikes many people, takes pleasure 
in bossing people, selfish, frequently in conflict, contempt for opinions of 
others, self-confident about his own abilities, “hard-boiled.” 


Co: Lack of cooperativeness 

Positive: Lack of faith in people, believes most people shirk their duties, dislikes 
his superiors, against large business corporations, dislikes traffic regulations, 
distrustful of all successful people. 
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TABLE 2 
Correlation Matrix for Guilford’s Thirteen Scores 


ae TE eee SE oe ee ee ee ae 

















1 00 42 64° 44 66 388.73 20 69.38 47 14 22 
oil 42 £84 65 .59 80-.07 .20 .21 34 89 .41 .17 .24 
a 64 .65 .94 90 .28-04 48 82 .74 .71 .75 .84 .44 
TALE 6 44 59 .90 .88 -02 -19 .81 .83 .68 .70 .72 .85 .42 
eee 66 .80 .28 -.02 .90 .56 .538 .04 .27 .08 .21 -.08 -.02 
6. G .88 -—.07 -.04 —19 .56 .89 .44 -07 .09 -.23 -.06 -.81 -.17 
40 BR “A320. 4881 68-44 88 6. .6T.-B8 -46 00-20 
& M ©.10 21 °-32)88  08=07 26 45> 83:86 87 .0b 21 
- Aaa | 9. OS 14°68 27-09" ).67 68 Of ..67 45 36° .45 
10. N  .88 .89 .71 .70 08 —28 88 85 .67 .89 .72 AT 58 
mw @ 47 41...75 .72 21-06 46 37 16 12 88 .50 62 
12. Ag .14 .17 .84 .85 -08 -—31 .00 01 .85 .47 .50 .80 .68 
18. Co 22 .24 44 42:—02:—4%7 20:21 45 68 62 68 51 
TABLE 3 
Orthogonal Factor Matrix F 

I II III IV Vo NE (VE oI ERE IX 

i -75 — AT 18 .05 20 —13 —.07 —.11 .09 

2. 58 13 58 10 —.26 —.04 .03 ll —.15 

3. .88 19 28 —.14 10 —.02 .07 —.06 .08 

4, Pil 41 28 —.25 13 .04 15 —.02 .04 

5. 45 —.70 13 25 —.16 16 —.22 14 .05 

6. 15 —79 —17 —12 —.18 .25 27 —.14 .04 

if 67 —50 —17 —.17 14 —.29 .06 16 —.08 

8. 41 20 —.22 —38 —.50 —22 —.32 .09 07 

9. .83 07 —15 —.18 19 14 —.08 ll 14 

10. 74 29 —.08 —.09 13 12 —16 —09 —.18 

11. .83 25 —.14 .03 .08 .09 .06 17 07 

12. 42 44 —.22 51 28 08 —.05 —.11 —.08 

13. 58 388 —.35 42 —08 —.14 21 —.11 18 

TABLE 4 
Frequency Distribution of 9th Factor Residuals 
N= 156 
Residual: —.05 —04 —.08 —02 —.01 .00 01 .02 .03 


Frequency: 2 2 8 22 26 42 32 12 10 
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TABLE 5 
Transformation Matrix A 
R Ss E Vv D A I my a 
I 26 .30 30 17 16 16 16 15 12 
II —.06 13 10 08 -—28 —.09 —.32 .03 04 
III 80 —.40 48 —21 —03 —34 —15 —17 —.03 
IV 34 80 —.28 —.23 01 .08 .04 —.28 .08 
VV —42 —.20 —.04 —.56 387 —384 —.32 28 —.15 
VI —.02 .08 07 —11 —85 —.01 .85 18 14 
VII 03 —.02 00 —72 —.04 84 a4. 21 —.06 
VIII 06 —.22 —.76 —17 —.04 —11 —.01 82 —.17 
IX .00 .00 .00 .00 —.16 .00 .00 .00 —.95 
TABLE 6 
Oblique Factor Matrix V 
Guilford’s Scores Ss 2S WV: DA. oT Aw 
1-S_ Social Extraversion -—.29 .11 2322 .01 .42 -01 .06 -.08 -.04 
2-T Thinking Extraversion -.76 .06 .386 .07 00-01 .02 -.02 .22 
38-D Freedom from Depression -35 .05 .50 .04 .12 .05 -.01 .18 .01 
4-C Emotional Stability —26 -05 .50-.02 .00 .05 -.05 .22 .02 
5-R Rhathymia -—41 .21-03 .14 .07 -04 .45 -.02 .03 
6-G General Activity 02 .00 .05 -07 -04 44 .60 .02 .01 
7-A  Ascendance .03 -.02 -.08 .08 .55 .18 .00 .80 .04 
8-M Masculinity 00 .00 .08 .74 .01 -.01 -—04 .01 .02 
9-I Freedom from Inferiority 
Feelings 05 .12 .15 .14 .04 .02 .17 .85 -.06 
10-N Freedomfrom Nervousness -.01 .24 .82 .24 .00 -07 .05 .10 .29 
11-0 Objectivity -.08 381 .07 .06 .00 .16 .18 .84 .02 
12-Ag Agreeableness .03 .66 .00 -.05 -.01 .03 —.03 -.06 .19 
18-Co Cooperativeness -.03 .72 .00 .03 O07 .48 -.03 -.03 .00 
TABLE 7 
Correlations between Primary Factors 
R Ss E V D A I xX, X, 
R 1.00 —.11 —.23 15 07 11 —.O1 06 —.02 
Ss —.11 1.00 52 —.03 01 —37 —.15 56 —.14 
E —.23 62 -300 - -06 04 —18 —.10 66 —.12 
V 15 —.03 .05 = 1.00 .03 382 —.11 30 —.09 
D 07 01 04 03 100 —.17 72 03 —.19 
A 11 —37 —.18 02 —17 4100 —.26 —.16 04 
I —.01 —15 —10 —.11 ‘711 —26 100 —.19 —.22 
X, .06 56 66 30 03 —16 —19 1.00 —.01 
Xx, —.02 —.14 —12 —.09 —.19 04 —22 —01 1.00 
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A NOTE ON CORRECTING FOR CHANCE SUCCESS 
IN OBJECTIVE TESTS 


SAMUEL B. LYERLY 
THE UNIVERSITY OF NORTH CAROLINA 


The conventional scoring formula to “correct for guessing” is 
derived and is compared with a regression method for scoring which 
has been recently proposed by Hamilton. It is shown that the usual 
formula, S = Rk — W/(n—1), yields a close approximation (cor- 
rect within one point) to the maximum-likelihood estimate of an 
individual’s “true score” on the test, if we assume that the individ- 
ual “knows” or “does not know” the answer to each item, that guess- 
ing at unknown items is random, and that success at guessing is 
governed by the binomial law. It is also shown that the usual scor- 
ing formula yields an unbiased estimate of the individual’s “true 
score,”’ when the true score is defined as the mean score over an in- 
definitely large number of independent attempts at the test or at 
equivalent (parallel) tests. 


Hamilton’s recent paper on correcting for guessing in objective 
tests* proposes a method which differs greatly from that which most 
test technicians have customarily used. Hamilton advocates the use 
of a regression equation based upon the known or assumed distribu- 
tion of examinee knowledge, i.e., the distribution of scores which 
would be obtained if guessing were excluded and each individual 
answered only those items which he “knew” and refrained from 
marking those which he did not know. Assuming a binomial dis- 
tribution of examinee knowledge, and assuming further that every 
examinee makes a response to every item and that the relative fre- 
quency of successful responses on those items whose answers are not 
known will be governed by the binomial law, Hamilton derives an 
equation of the form 


S:= (kR—n)Ri/(k—1)R, (1) 
where 


*Hamilton, C. Horace. Bias and error in multiple-choice tests. Psycho- 
metrika, 1950, 15, 151-168. 






21 











22 PSYCHOMETRIKA 


S; =the estimated true score for Individual 7, 

k =the number of alternatives per item, 

R,; =the raw score (number of items correctly answered) 
for Individual 7, 

R =the mean raw score for the group of N individuals, 
and 

m =the number of items in the test. 


The factor (kR — n)/(k — 1)R in Eq. (1) is the regression 
coefficient of true scores on raw scores for the group of individuals 
under consideration. Hamilton shows that this regression is in gen- 
eral non-linear, depending upon the form of the distribution of ex- 
aminee knowledge, and departing from linearity as the distribution 
of true scores departs from the binomial. He presents alternative 
formulas for use when some or all of the examinees do not complete 
the test and when the assumed distribution of examinee knowledge 


is not binomial. 


Hamilton considers that the conventional scoring formula, which 
is of the form 
S;= (KR; —n)/(k—1), (2) 
is a mistaken one, based upon the regression of raw scores upon true 
scores. 


It is the purpose of this note to present a complete derivation 
of the conventional scoring formula and to show that under the as- 
sumptions of (1) perfect knowledge or perfect ignorance on each 
item of the test, (2) “pure” random guessing at unknown items, and 
(3) a binomial distribution of success at guessing on unknown items 
(assumptions which Hamilton also makes), the usual scoring method 
yields a value which is a close approximation to the maximum-likeli- 
hood estimate of the desired true score and is, in addition, an un- 
biased estimate of the true score defined as the limit approached by 
the mean estimated score in an indefinitely large number of attempts 
at the same test (or parallel tests). 


Test technicians have not generally considered the problem of 
correcting for guessing as a regression problem in the ordinary sense 
of the term, but as a problem in sampling. We have a sample of re- 
sponses—the test responses of Individual i—and we wish to estimate 
therefrom a “true score;’” i.e., we wish to determine the true score 
which would make the obtained raw score most probable. We are 
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thus making an inference from sample to population. It must be 
noted that the “population” here is the class of true scores to which 
Individual 7’s true score belongs—not the class of true scores of all 
members of some group of individuals. Similarly, the “sample” un- 
der consideration is the single set of test responses of Individual 7 
rather than a set of scores made by a sample of individuals. These 
distinctions are fundamental for a comparison of Hamilton’s scoring 
system with the conventional one. 


The test user who employs the correction formula may have 
either of two purposes in mind: (1) He may wish to estimate the 
exact number of answers the subject knows on a single test, or (2) 
he may wish to estimate the mean “true score” of the subject in a 
series of trials at the test (or at parallel tests). We shall consider 
the two problems in order. 


1. Estimating the Exact Number of Answers Known on a Single Test 


Let us suppose that Individual 1 makes a raw score FR; on a test 
of items each having k alternatives. We shall assume that R; is the 
sum of an unknown number S;, which represents the number of 
items to which he knows the answers, and an unknown number 
(R;—S;) of items for which he receives credit as a consequence of 
lucky random guesses. We wish to estimate S; under the assumption 
that success at guessing on the (n—S;) unknown items is binomially 
distributed. (Since we are dealing with one individual, we shall omit 
the subscript.) 


There are R + 1 hypotheses which we may entertain concern- 
ing the value of S. We may list them as follows: 


H(S=R): The subject knew the answers to R items, 
guessed at the remaining (m—fF) items, and 
was unsuccessful in each case. S—=R. 


H(S=R—1): The subject knew the answers to (R—1) items 
and guessed at the remaining (n—R+1). He 
was successful in one guess and unsuccessful in 
(n—R). S=R-—-1. 


H(S=0): The subject did not know any of the answers 
and guessed at all m items. He was successful 
in R guesses, unsuccessful in (n—R). S = 
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For each of the hypotheses we can compute a probability based 
upon the binomial law: 


P(S=R) = [(n—R) !/0!(n—R) !]p’qr’ . 
P(S=R—1)=[("—R +1) Y/1!(n— R) !]p'q*. (3) 


P(S=0) = [n!/R!(n— R) !]p®q"*. 

The hypothesis with the greatest probability as calculated by 
Eqs. (3) is the one which we will accept, in the sense that we con- 
sider it the one with the greatest likelihood of being the “true” hy- 
pothesis. As an illustration, let us consider the case of 5 correct an- 
swers out of 10 in a 5-response multiple-choice test. There are 6 hy- 
pothesis which may be examined: S=5, S=4, S=3, S=2, S=1, 
and S=0. Applying Eqs. (3), we have: 

5! 
P(S =5) =—— (.2)°(.8)° = 328. 
015! 


6! 


P(S= 4) =—— (.2)1(.8)*° = .398. 
1!5! 


! 
P(S=3) =—— (.2)*(.8)*=.275. 
2!5! 


8! 
P(S =2) =—— (.2)*(.8)°=.147. 
3!5! 


9! 
P(S =1) =—— (.2)*(.8)>= .066. 
4!5! 


10! 
P(S=0) =—— (.2)*(.8)*> = .026. 
515! 


The probabilities listed above are interpreted as follows: If the hy- 
pothesis S=5 were true, we would expect an R of 5 with a relative 
frequency of about 33%; i.e., our subject through random guessing 
at the 5 items he did not know would be unsuccessful in all 5 ques- 
tions in about one-third of an extended series of independent at- 
tempts at the test. Similarly, an obtained R of 5 which includes one 
successful guess in 6 attempts would be expected to occur with a 
relative frequency of 39% in an extended series of guessing on 6 
unknown items. Since this is the largest of the computed values, we 
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accept the hypothesis S=4 as that which makes the obtained FR the 
“most probable.” (It should be noted that in making this estimate 
we are considering only integral values of S. Under our initial as- 
sumptions, we cannot entertain hypotheses of fractional S’s.) 


By making a simple approximation we can arrive at an easier 
estimate. (In a later paragraph we shall consider the error involved 
in our approximation.) Each equation in (3) is one term of a bino- 
mial expansion, and the largest computed value by Eqs. (3) is at 
the mode of the binomial frequency distribution of which it is a term, 
provided that the distribution has a unique mode and provided R 
is sufficiently large relative to n that we do not need to consider 
negative S’s (R > np). If we accept the mean of the binomial, 
S + (n — S)p, as an estimate of its mode,* we conclude that the 
value of S which maximizes the probability of an obtained R is that 
value of S for which 

R=S+ (n—S)p. 
Then 
S= (R—np)/(l1—p) = (kR—n)/(k—1). (4) 


Eq. (4) is identical with (2) above, and is the familiar formula 
to “correct for guessing.” It is a “maximum-likelihood” estimate 
when S is an integer in that no other hypothesis concerning the 
value of S can give a higher probability of obtaining the given R. 


The scoring formula does not assume that the subject answers 
every item on the test. We may replace n in Eq. (2) by m, letting 
m stand for the number of items attempted, or we may use the fa- 


miliar variation 
S=R—W/(n—1), (5) 


where W is the number of wrong answers. The usefulness of Eq. 
(5) or an equivalent is obvious, since no scoring formula to adjust 
for chance success will alter the relative standings of individuals 
in a group when everyone attempts every item. Correcting for guess- 
ing is therefore rarely used unless some or all examinees do not com- 
plete the test. 


A few remarks are in order concerning the use of the conven- 
tional scoring formula to provide estimates of this kind. It will be 


*The mode of the binomial, if it exists, is always within one unit of the 
mean—in fact, it lies between Np—gq and Np+p. If there is no unique mode, 
Np—q and Np-+p are consecutive integers which represent the 2 greatest fre- 
quencies in the distribution. 
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recalled that in the derivation we used the mean of the binomial dis- 
tribution as an estimate of its mode. Since the distributions with 
which we are dealing are discrete and in general non-symmetrical, 
Eq. (2) is not exact. A study of Eqs. (3) will reveal the fact that 
the probability for the hypotheses S = (R — np)/(1 — p) and that 
for the hypothesis S = [(R — np)/(1 — p)] + 1 are equal when S 
as computed by Eq. (2) is an integer and n > R > np. This follows 
from the theorem in probability theory that if P, and P, are the 
probabilities of the most probable number of successes (actually 
“failures” in our problem) in m and m + 1 trials, respectively, then 
P, > P., the equality sign holding when (m + 1)p is an integer. 
For example, in a true-false test of 10 items, the probability that an 
individual with a true score of 6 will receive a raw score of 8 is 
equal to 
4! 


2!2! 





(.5)* = .3750. 


The probability that a true score of 7 would give rise to a raw score 
of 8 is 
» ! 


1!2! 





(.5)* = .3750. 


Similarly, in a 5-response multiple-choice test of 10 items, a raw 
score of 2 can arise with equal likelihood from a true score of either 
0 or 1, since 
! 9! 
(.2)?(.8) § =——__ (.2) 7(.8) § = .80199. 
1!8! 


918! 





In such cases there is no modal value for S which will maximize the 
probability of R except when R = vn, in which case the mode is at 
S =n; or R < np, when the mode is at S=0. (This latter case is 
justification of the common practice of assigning an adjusted score 
of zero to a raw score when the corrected value by Eq. (2) falls be- 
low zero, since by Eqs. (3) S = 0 is the most probable true score.) 
Since there is no modal value when the S computed by Eq. (2) is an 
integer and n > R 2 np, we may use the formula as it stands, in- 
crease each integral S by one unit, or toss a coin in a given case to 
determine whether to add 1 or use the calculated S. 


When S as calculated by Eq. (2) is not an integer, a similar 
problem arises. A consideration of Eqs. (3) along with the approxi- 








aS eS, oe ee 
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mation Eq. (2) reveals that the latter introduces a bias in such 
cases. This bias arises from the fact that the distribution of S, i.e., 
the distribution of probabilities calculated by the R + 1 equations 
in (3), is discrete and has a negative skew when p < $. The modal 
value of S is in such cases always the next integer above the S cal- 
culated by Eq. (2) when the latter is a fraction. For example, in 
the case of 7 correct answers out of 10 in a 5-response multiple- 
choice test, the S calculated by the correction formula is 7 — {= 6.25. 
The probabilities computed by Eqs. (3) are .4096 for S = 6 and 
.5120 for S = 7. Thus, although 6.25 is nearer 6 than 7, the most 
probable value for the corresponding true score is 7. Adjusted scores 
calculated by Eq. (2) which are fractions should therefore be raised 
to the next whole number, unless p > 4, in which case S should be 
reduced to the next integer. This latter situation would arise if, say, 
a test were composed of items each having 4 alternatives of which 
3 were correct, and the examinee is instructed to mark only one alter- 
native for each item. The distribution of S would in this case be 
positively skewed, and the use of Eq. (2) would overestimate S. 


2. Estimating the Mean “True Score” in a Series of Independent 
Trials at the Test (or at Parallel Tests) 


It might appear at first that this problem, estimating an indi- 
vidual’s mean true score over an indefinitely large number of trials, 
can be solved by the method of Hamilton—i.e., by using the regres- 
sion of true scores upon raw scores in a sample of individuals, basing 
calculations upon the known or assumed distribution of true scores 
(provided we could somehow arrive at a satisfactory idea of the true 
score distribution). However, if the problem of correcting for guess- 
ing is considered as a problem in regression, the appropriate esti- 
mate of an individual’s true score is the mean of a number of esti- 
mates for that individual rather than the mean of the estimates for 
a group of individuals. It is the individual’s mean toward which his 
scores regress, not the group mean. An analogy may clarify the 
point: Suppose that we ask a person to take out of his pockets all 
the coins which he happens to have and to throw them onto a table. 
Then we ask him to report the number of coins which fell in such a 
way that the heads are uppermost. If we are interested in estimat- 
ing the total number of coins on the table from the number showing 
heads, the information we have is sufficient. We would not seek to 
improve our estimate by asking other individuals to empty their 
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pockets and report the number of heads showing on their coins un- 
less we knew that everyone had the same number of coins to start 
with—which would make the experiment equivalent to a repetition 
of the procedure with the same person. 


Taking this view of the estimation problem, we return to Hamil- 
ton’s derivation on pages 153 and 154 and re-define N to be the num- 
ber of independent trials at the test (or at parallel tests) by Individ- 
ual i rather than the number of subjects in the group. We see that 
Hamilton’s Eq. (3), 


S= (kk —n)/(k—1), [Hamilton’s Eq. (3) ] 


which he derives as the mean true score for a group of N individuals, 
turns out to be the expression for the mean estimated score of a sin- 
gle individual over a series of N trials. But ordinarily we have only 
one trial from which to make our estimate and only one R;. This 
single R;, however, is an unbiased estimate of Ry and is the best 
estimate we have of R; in the absence of other scores for that indi- 


vidual. The substitution of R; for #; in our modification of Hamil- 
ton’s Eq. (3) yields the familiar scoring formula, Eq. (2). Similar 
substitutions in Hamilton’s own recommended correction formulas, 
Eq. (15), page 156, and Eq. (22), page 157, also yield the conven- 
tional correction formula. The traditional scoring formula, then, 
yields an unbiased estimate of the individual’s “true score,” defined 
as the limit approached by the mean estimated score as the number 
of independent trials increases. This estimate is not adjusted to an 
integral value as is the single-trial estimate discussed above. 


Hamilton’s analysis of the problem differs fundamentally from 
that presented above. He apparently is seeking to minimize errors 
of estimate over a sample of individuals by estimating the true score 
for an examinee as the mean true score for all individuals in the 
sample who have the same raw score. A consequence of Hamilton’s 
method is that a subject’s estimated score depends upon the distri- 
buticn of scores in the group with which he is examined and will 
vary as the general level of performance of the group.* Hence it 


*Some idea of this dependence can be gained through applying Hamilton’s 
basic scoring formula, Eq. (1) above, to the case of an individual who has 75 
correct responses on a true-false test of 100 items. By the conventional correc- 
tion formula, his adjusted score is 75—25-—50, regardless of the mean perform- 
ance of the group or regardless of whether he was even examined with a group. 
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will ordinarily be impossible for anyone to earn a perfect score (even 
the person who constructed the test!) if Hamilton marks the papers, 
unless everyone makes a perfect score—in which case the test ceases 
to be a test in the usual meaning of the term. The conventional 
scoring method, in which the individual’s set of responses is the sam- 
ple and his own true score’ the desired population value, does not en- 
counter such a difficulty. The independent variables include only the 
subject’s performance and the structure of the test. The distribu- 
tion of ability in the group (or even the existence of a group) is 
irrelevant. 

It is true, of course, that scores generally will tend to regress to- 
ward the group mean upon retest. This phenomenon is a consequence 
of the less-than-perfect reliability of the testing instrument and is 
observed in connection with tests of all kinds, whether of the “objec- 
tive” type or the “free-answer” type in which the probability of 
chance success is presumably zero. Regression methods (which re- 
quire the standard error of measurement or the reliability coefficient) 
may be used to adjust scores for this effect; but the problem of reli- 
ability and the question of what adjustment, if any, should be made 
to test scores on account of their less-than-perfect reliability have 
been ignored in this paper. Reliability is considered to be a separate 
(though not unrelated) matter and one which requires other assump- 
tions and other information for its treatment than does the simpler 
problem of adjusting for chance success. 

In conclusion, it should be recognized that the assumptions em- 
ployed in the correction for guessing (perfect knowledge or perfect 
ignorance on each item and “pure” random guessing at unknown 
items) are only approximately appropriate for most objective tests. 
“Partial” knowledge, positional response tendencies, the ability to 
eliminate one or more alternatives, the lack of independence among 
items or among alternatives within an item (as, for example, when 
alternatives are ordered along a continuum), varying degrees of 





If we substitute several values for a group mean in Hamilton’s equation, we get 
estimates such as: 


Assumed Group Mean Estimated Score for Individual 
with R; = 75 
90 67 
80 ; 56 
70 43 
60 25 


50 0 
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“willingness to gamble’—these and other circumstances weaken the 
validity of our mathematical model, as the results of almost any item 
analysis will show. The effects of these factors cannot be determined 
from an inspection of raw test scores, and it is doubtful that any 
general scoring formulas can be found which will take them into ac- 
count. It may be possible, however, to devise empirical scoring meth- 
ods which are approximately valid for certain kinds of tests and for 
certain classes of individuals. 
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The problem considered is the use of a set of measurements 
on an individual to decide from which of several populations he has 
been drawn. It is assumed that in each population there is a prob- 
ability distribution of the measurements. Principles for choosing 
the rule of classification are based on costs of misclassification. Op- 
timum procedures are derived in general terms. If the measure- 
ments are normally distributed, the procedures use one discriminant 
function in the case of two populations and several discriminant 
functions in the cases of more populations. The numerical example 
given involves three normal populations. 


1. The Problem of Classification. 

The problem of classification which we shall consider arises when 
an investigator makes a number of measurements on an individual 
and wishes to classify this individual into one of several categories 
on the basis of these measurements. The investigator cannot iden- 
tify the individual with a category directly but must make his in- 
ference from these measurements. In many cases it can be assumed 
that there are a finite number of categories or populations from 
which the individual may have come, and each population is charac- 
terized by a probability distribution of the measurements. An in- 
dividual is considered as a random observation from this population. 
The question is: Given an individual with certain measurements, 
from which population did he arise? 

The problem of classification may be considered as a problem of 
“statistical decision functions.” We have a number of hypotheses, 
each hypothesis is that the distribution of the observation is a given 
one. We must accept one of these hypotheses and reject the others. 
If only two populations are admitted, we have an elementary prob- 
lem of testing one hypothesis of a specified distribution against an- 
other.} 


*Sponsored in part by the Office of Naval Research. 

The general theory described in this paper can be deduced as a special case 
of A. Wald’s theory (9). M. A. Girshick presented some of this theory to the 
meeting of the Institute of Mathematical Statistics at Berkeley, June 16, 1949, 
in “Bayes, Minimax and Other Approaches to Multiple Classification Problems,” 
and G. W. Brown (2) described some of the results before the American Sta- 
tistical Association at Cleveland, December 27, 1948. 


31 








32 PSYCHOMETRIKA 


In some instances, the categories are specified beforehand in the 
sense that the probability distributions of the measurements are com- 
pletely known. In other cases, the form of each distribution may be 
known, but the parameters of the distribution must be estimated 
from a sample from that population. For instance, the two popula- 
tions may be multivariate normal with means, variances, and corre- 
lations unknown. 

We can give an example of our problem from the field of educa- 
tion. Prospective students applying for admission into college are 
given a battery of tests. The scores on these tests for a given student 
form a set of measurements, x,, --- , X. The prospective student 
may be a member of one population consisting of those students who 
wili successfully complete college training,* or he may be a member 
of the other population, those who will not complete the college course 
successfully. The problem is to classify each student applying for 
admission on the basis of his scores on the entrance examinations. 


2. Standards of Good Classification. ° 

To begin with we shall suppose that an individual with certain 
measurements (%,,---, Zp») has been drawn from one of two popula- 
tions, 2, and 2,. The properties of these two populations are specified 
by the probability density functions (or frequency functions), p,(%, 
“++, Xp) and po(%,, +++, 2), respectively.| We wish to define a pro- 
cedure of classifying this individual as coming from 2, or a,. The 
set of measurements 2,, --- , 2, can be presented as a point in a 
p-dimensional space. We shall divide this space into two regions, R, 
and R,. If the point corresponding to an individual falls in R, we 
shall say the individual was drawn from 2,, and if it falls in R. we 
shall say he came from z,. 

We wish to select these two regions so that we minimize on the 
average the bad effects of misclassification. In following a given 
classification procedure the statistician can make two kinds of errors. 
If the individual is actually from 2a, the statistician can classify him 
as coming from population 2,, or if he is from z, the statistician may 
classify him as from 2,. We need to know the relative undesirability 
of these two kinds of misclassification. Let the “cost” of the first 
type of misclassification be C(2|1), and let the cost of misclassify- 
ing an individual from 2, as from 2, be C(1|2). These costs may be 


*To avoid raising the question of prediction it might be better to say that 
this population consists of those students with potentialities for successfully 
completing college training. 

Each infinite population is an idealization of the population of all possible 
observations. 
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measured in any kind of units. As we shall see later, it is only the 
ratio of the two costs that is important. While the statistician may 
not know these costs in each case, he will usually have at least a 
rough idea of them. The following two by two table indicates the 
costs of correct and incorrect classification. 


Population Drawn From 


ly M2 
Statistician’s Ty 0 C(1|2) 
Decision Ty C(2|1) 0 


We shall consider two ways of defining “minimum costs.” Which 
definition should be used will depend on the knowledge one has of the 
situation. Let R denote the rule of classification; the rule implies a 
division of the p-dimensional observation space into the two regions 
R, and R,. If the observation is drawn from 2,, the probability of 
correct classification, P(1|1,R), is the probability of falling into R,, 
and the probability of misclassification, P(2|1,R), is the probability 
of falling into R.. For instance, 


P(1|1,R) = DP, (21 ,°+*, Lp) dx, +++ dxy. (1) 
Ri 

Similarly, if the observation is drawn from z,, the probability of 
correct classification is P(2|2,R), the integral of p.(x.,-:- , Zp) over 
R, , and the probability of misclassification is P(1|2,R). If the obser- 
vation is drawn from 2, , there is a loss when the observation is in- 
correctly classified as coming from 2,; the expected loss, or risk, is 
the product of the cost of a mistake times the probability of making 

it, 
r(1,R) =C(2|1)P(2|1,R). (2) 


In the same way we see that when the observation is from z,, the 
expected loss due to misclassification is 


r(2,R) = C(1|2)P(1|2,R). (3) 


In many cases we have a priori probabilities of drawing an ob- 
servation from one or the other population. Suppose that the prob- 
ability of drawing from 2, is q, and from z, is g.. Then the expected 
loss due to misclassification is the sum of the products of the prob- 
ability of drawing from each population times the expected loss for 
that population. It is 
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gr (1,R) + qor(2,R) = qC(2\1)P(2|1,R) + q2C(1|2)P(1|2,R). (4) 


We wish to choose our regions, R, and R, , to minimize this expected 
loss. 

In the example mentioned earlier one “cost of misclassification” 
is a measure of the undesirability of starting a student through col-. 
lege when he will not be able to finish, and the other is a measure of 
the undesirability of refusing to admit a student who can complete 
his course. If a group of students are given the tests before the clas- 
sification procedure is inaugurated and all are allowed to enter, esti- 
mates of the probabilities of drawing from the two populations are 
obtained from the relative frequencies of students who complete and 
who do not complete college. 

If we do not have a priori probabilities of drawing from a, and 
72, We cannot write down (4). Then we can only speak of the ex- 
pected loss if the observation is drawn from a, or if the observation 
is drawn from z,. For a given procedure R, the less desirable case 
is to have a drawing from the population with the greater risk. A 
conservative principle to follow is to choose our procedure so as to 
minimize the maximum risk. This is the so-called minimax principle. 


3. Procedures of Classification into One of Two Populations with 
Known Probability Distributions. 


Now let us see how we find the regions which give us the 
minimum expected loss. First we shall treat the case when a priori 
probabilities of drawing from 2, and 2 are known. Since we have 
a priori probabilities we can define joint probabilities of drawing 
from a given population and observing a set of variables within given 
ranges. The probability that an observation come from z, and that 
the ith variate be between x; and x; + dz; is approximately q,p. (2, . 
“++ , Zp) daz, --- dx,. Similarly, the probability of drawing from 2, 
and obtaining an observation with the ith variate falling between 
x; and x; + dz; is approximately qop.(a1,-++ , %p)dx, +--+ da». If we 
now have an observation x,,--- , «,, the conditional probability that 
it comes from 2, is 





QiD1 (%1, +++, Lp) 
, (5) 
Q:D1 (1 +++ 5 Lp) + QePo (Xs, +++, Lp) 
and the conditional probability that it comes from 2, is 
Q2P2(X, aed ) 
- (6) 





Q:Di (21, +++, Lp) + QoPe (Hs, +++, Lp) 
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Suppose for a moment that C(1|2) = C(2\1) = 1; then the ex- 
pected loss is simply the probability of misclassification. For a given 
observation 2,, --: , 2» we minimize the probability of misclassifica- 
tion if we assign it to the population with a greater a posteriori prob- 
ability; that is, if (5) is greater than (6) we say the observation 
came from 2,, and if (5) is less than (6) we say it came from a. 
Since the denominator of (5) is the same as that of (6) our rule for 
classification gives us a division of the space into R, and R, accord- 
ing to the following: 


Ry: Q1Pi (21, +++, Lp) > Q2Po(Xi ,-*+, Xp), 
(7) 
Ro: QipPi(%1 +++ Lp) < QoPo (Xr, +++, Lp). 


R, consists of points satisfying the first inequality, and R, consists 
of points satisfying the second.* 


If C(2|1) # C(1/2), 


C(2\1) QiDi1(%1,°**, Lp) 
iD (Xr +++ Xp) + QoDo(X1,°*+, Lp) 





is the conditional expected loss if we classify an observation 
%,,+++, & into a, and 


C(1|2) qep2(%1, +++, Lp) 
QiD1 (X14 °+* 5 Lp) + GoDo (U1 ,°**, Lp) 





is the conditional expected loss if we classify this observation into a, . 
We minimize the expected loss for this particular observation if we 
classify it to obtain the lower expected loss. The regions are char- 
acterized as follows: 


Ry: C(2|1) qipi(%1 +++, Xp) > C (1/2) qopo (a1, +++, Xp), 
; (8) 
Ra: C(2|1) Gis (41 5 +++, Xp) < C(1|2) qope (a1, +++, Xp). 


We could also write this as} 


*The case of equality in (7) can be neglected since we assume the density 
functions are such that the probability of equality is zero. 


+These results were first obtained in this way by Welch (10) for the case of 
equal costs of misclassification. 
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f Pi(X1, +++, Xp) C (1/2) qe 
pe (Bry +++, %p) C(2|1)qa 
; 9.4 Hs, °°>, Xp) C(1|2)q2 
"pe (ry +++) 2)  C(2/1)qa 


This is the “Bayes solution.” 





(9) 


These inequalities seem intuitively reasonable. If the probabil- 
ity of drawing from z, is decreased or if the cost of misclassifying 
into 2, is decreased, the inequality in (9) for R, is satisfied by fewer 
points. Since the regions depend on q, and q., the expected loss does 
also. The curve A in Figure 1 indicates how the expected loss may 


vary with qg, (and q.—=1—4q;,). 


Expected 
loss 





a 

















Qo I: g 
FIGURE 1 


oe 


4% 


It may very well happen that the statistician errs in assigning 
his a priori probabilities. Suppose that the statistician used q, and 
@:(=1— 4) when q,’ and g.*(= 1— q,*) are the actual probabilities 
of drawing from 2, and 2,, respectively. Then the actual expected 
loss is 
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q:°C (2\1) P(2|1,R) + (1—@.°)C(1|2) P(1|2,R), (10) 


where FR, and R, are based on q, and q.. Given the regions R, and 
R- , this is a linear function of g," graphed as B. This line touches A 
at g,° = q@%. It cannot go below A because for g* # q; the best re- 
gions are defined by (9) for q = q:° and q = 1 — q,*. From the 
graph we can guess that a small error in q, is not very important. 

Now let us turn to the case where the statistician cannot assign 
a priori probabilities to the two populations. Then he may choose a 
procedure that minimizes the maximum expected loss. It can be 
shown (by the Neyman-Pearson Fundamental Lemma) that the best 
regions of classification are of the form 


Pi (21, +++, Xp) 


R: 
D2 (X1,+**, Lp) 





>k, 


(11) 


Pp (x » oy My 
Bis: aeeen — << e, 


Do(Xi,+**, Lp) 





where k is suitably chosen. It should be noticed that for any par- 
ticular & there are a priori probabilities g, and q. satisfying 


pa Mie 

C(2|1)q. 

Thus every minimax solution is a Bayes solution for some a priori 
probabilities. Since R, increases as k increases, and hence r(1,R) 


increases as k increases, and at the same time 7(2,R) decreases, the 
choice of & giving the minimax solution is the one for which* 


r(1,R) =7r(2,R). (13) 


This is then the average loss for it is immaterial which popula- 
tion is drawn from. The graph of the risk against a priori prob- 
aility q, is, therefore, a horizontal line (labelled C). Since there is 


(12) 


one value of q,, say qi , Satisfying (12), the line C must touch A. 


4, Classification into One of Two Known Multivariate Normal 
Populations. 
One of the most interesting examples of the general theory is 


*As for the Bayes solution we assume that the densities are such that the 
probability of equality in (11) is zero. 
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in the case that the populations have multivariate normal distribu- 
tions with the same set of variances and correlations, but with dif- 
ferent sets of means. Suppose that x,, --- , x have a joint normal 
distribution with means in a, of 6x; = wi, and in a of 6x; = w,. 











Let the common set of variances and correlations be o,”, --- , 9,’, 
Piz» Piz» *** » Ppa,p- We shall find it convenient to write (11) as 
Pp (x ogi i ) 
R,: log ae : > log k, 
Po(Xi,°°", Lp) 
( ) : oii 
p He 9 °°%, z 
Rz: log — : ~- < log k. 
D2(X1,°**, Lp) 
It is easily verified that in this particular case 
Pp (x "eed 3. ) A A 
log st : =D em —DAd(uY + ws), (15) 
D2(X1, es Lp) i=1 i=1 
where i, , --- , 4, is the solution of 
Pp 
D 95 9; pig AG i — i. (16) 


j=1 
The first term in (15) is the well-known discriminant function ob- 
tained by Fisher (3) by choosing the linear function for which the 
difference in expected values for the two populations relative to the 
standard deviation is a maximum. The regions are given by 


Pp p 
R32 SA > DA a (ms™ + ws) + logk, 


i=1 i=1 


. F (17) 
R,: Z Ae < DA hi 4 (ui + pi) 2 log k. 
If we assign a priori probabilities, then 
C(1|2)p 
C(2|1)p, 


In particular, if k = 1 (for example, if C(1|2) = C(2\1) and qi: = 
G2 = 4), log k = 0, and the procedure is to compare the discriminant 
function of the observations with the discriminant function of the 
averages of the respective means. 

If we do not know a priori probabilities, we wish to find log k 
= c¢, say, so that the expected loss when the observation is from 
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a, is equal to the expected loss when the observation is from 2. 
The probabilities of misclassification can be computed from the dis- 
tribution of 


p Pp 
U= TA ew—DAsMY + wi) 
i=1- i=1 
when 2, +++, £» are from a, and when 2@,,---, «» are froma. Leta 
be the “distance” between 2, and a.. 


p 
a= Ai (ui — wi). (19) 
t=1 
The distribution of U is normal with the variance a. If the obser- 
vation is from 2, the mean of U is 4a; if the observation is from = 
the mean is — 4a. 
The probability of misclassification if the observation is from 
7m is 


c A * eer (c-4a)/ya 
1 2 


a 1 
P(2\|1,R) = e dz = —— 6 dy, (i) 
V22a V2a 


—00 -0 











and the probability of misclassification if the observation is from 
7» is 











s 1 - = ria 1 
P(1|2,R) = é d= = ew dy. (21) 
V2aa V2a : 
. (c+ta)/Va- 


Figure 2 indicates the two probabilities as the shaded portion in 
the tails. 




















FIGURE 2 
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We want to choose c so that 








cd (c-3a)/Va 
1 1 
r(2,R) =C(1|2) — et dy =C (2/1) — e dy=r(1,R). 
V2a V2a 
(c+ia)/ya_ -0 


(22) 
It should be noted that if the costs of misclassification are equal, 
c = 0 and the probability of misclassification is 


oo} 





—e" dy. 
via (23) 


va/2 


In case the costs of misclassification are unequal, c could be deter- 
mined to sufficient accuracy by a trial and error method with the 
normal tables. 

In passing let us note that if the set of variances and correla- 
tions in one population is not the same as the set in the other popu- 
lation we can still apply the general theory. In this case 





D:(%1, see Ly) 
og 


Po (X15 +++, Lp) 


is not a linear function of z,, --- , x», but a quadratic function. If 
we have a priori probabilities we can give an explicit solution to the 
problem. 


5. Classification into One of Two Multivariate Normal Populations 
when the Parameters are Estimated. 


Thus far we have assumed that the two populations are known 
exactly. In most applications of this theory the populations are not 
known, but must be inferred from samples, one from each popula- 
tion. We shall now treat the case in which we have a sample from 
each of two normal populations and we wish to use that informa- 
tion in classifying another observation as coming from one of the 
two populations. 

Suppose we have a sample (217, +++, py) (y=1,--+, N™) 
from 2, and a sample (217), «++ , Sw) (y=1,---,N®) froma. 
Then we can estimate yu; by the mean of the ith variate of the first 
sample 7; and yu; by the mean of the second sample £;°. The 
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estimate of o; 9; pi; is given by 





1 NW) 
N® +N®) —2 Z (Gia — Bi) (Aja — Fj) 
oo a=1 


(24) 


+ > (xia — i) (Xia — 2) | . 
a=1 

We can then substitute these estimates into our definition of U , ob- 
taining a new linear function of x,, --- , 2» depending on these esti- 
mates. Since there are now sampling variations in the estimates of 
parameters, we can no longer state that this procedure is best in 
either of the senses used earlier. However, it seems to be a reason- 
able procedure. 

The exact distributions of this statistic cannot be given ex- 
plicitly ; however, in the Appendix the distribution is indicated as 
an integral (with respect to three variables). It can be shown that 
as the sample sizes increase, the distributions of this statistic ap- 
proach those of the statistic used when the parameters are known. 
Thus for sufficiently large samples we can proceed exactly as if the 
parameters were known. 

A mnemonic device for the computation of the discriminant func- 
tion (4) is the introduction of a dummy variate, y, which is equal 
to a constant (say, 1) when the observation is from 2, and is equal 
to another constant (say, 0) when the observation is from 2,. Then 
(formally) the regression of this dummy variate y on the observed 
variates %,,--- , x» over the two samples gives a linear function pro- 
portional to the discriminant function. In a sense this linear function 
is a predictor of the dummy variate y. 

Often in studies in psychology or education one has a set of p 
variates 2,,,-++ , %» and one more variate, y, which is a continuous 
variate (taking on more than two values). For example, the p vari- 
ates may be scores on a battery of tests constituting an entrance ex- 
amination, and the other variable y may be a measure of the degree 
of success in college (grade average, etc.). We may define “success” 
then as a score on y equal to or greater than some number a. Given 
the scores x,, +++ , 2» for an individual, how should we classify him 
as to potentially a success or a failure? 

Let us assume that x,,--- , x», y have a joint normal distribu- 
tion. Then from our theory we can deduce that the proper function 
of x,, +++ , 2, to use for classification is the regression of y on 
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Dp 
X1,°*+ , Lp, say D Bi (xi — wi) + » where f; is the regression coefficient 
of y on 2; , 4; is the mean of x; and » is the mean of y. If this linear 
function is greater than a constant d we classify the individual as 
potentially successful, and if this function is less than d we predict 
failure. The number c depends on the joint normal distribution and 
the costs of misclassification. 

Suppose we do not use this theory but go ahead and find the 
ordinary discriminant function in this problem by replacing y = a by 
1 and y < aby 0. Then we arrive at the same linear function. How- 
ever, the constant log & will in general be different from the constant 
d. It would seem that when one must use samples to estimate this 
linear function that the latter procedure is not as efficient as the 
former.* 


6. Classification into One of Several Groups. 

Let us now consider the problem of classifying an observation 
into one of several groups. Let 2,, --- , 2m be m populations with 
density functions p,(x,, +++ , Zp), -** » Dm(%1, +++ , Zp), respectively. 
We wish to divide the space of observations into m mutually exclu- 
sive and exhaustive regions R,,---, R,. If an observation falls into 
R, we shall say that it comes from a,. Let the cost of misclassifying 
an observation from a, as coming from a, be C(h'g). The prob- 
ability of this misclassification is 


P(h\g,R) = ( Dy (Xs 5°**, Lp) da, +++ day. (25) 
JRn 
If the observation is from 2, , the expected loss or risk is 


r(g,R) =2 C(hig)P(hg,R). (26) 


hzg 


Suppose we have a priori probabilities of the populations, qi, ---, Qm- 
Then the expected loss is 





#1 


Sark) =a SC GFi)P (INR) |. (27) 
t=1 t=1 j=1 


We would like to choose R, , --- , R,, to make this a minimum. 


*Miss Rosedith Sitgreaves is studying this problem at Columbia University. 
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Since we have a priori probabilities for the populations we can 
define the conditional probability of an observation coming from a 
population given the values of observed variates, %,,-+- , %. The 
conditional probability of the observation coming from a, is 


WoPo (X15 ++, Lp) 


, (28) 
QiDi (Xi +++, Xp) Heer + QmDim (X15 *** » Lp) 





If we classify the observation as from 2, the expected loss is 


$e Clips, (29) 


g=-l1 m 


92h SS GPx (x) 


k=1 
where 2 denotes %,, -++, %». We minimize the expected loss at this 
point if we choose h so as to minimize (29) ; that is, we consider 


D WP (x)C (hig) (30) 
oa 
for all h and select that h that gives the minimum (if two different 
indices give the minimum, it is irrelevant which index is selected). 
This procedure assigns the point 2,, --- , x» to one of the R,. Fol- 
lowing this procedure for each point we define our regions R,,---, Rm 
according to 


m m 


Ry: X WPo(X)C(Ki9) < SY Gy (a)C (hig), h=1,---,m, (31) 
a1 i= h#k. 


g#k g#h 


If C(h\g) = 1 for all g and h(g#h), then x,,---, x, is in R, if 


~ UoPo(X) S > UPo(x) (h#k). (32) 
ook ooh 


m 


Subtracting S q,p,(a) from both sides of (32) we obtain 
g=1 


gzk,h 
Ri: GPn(e) < qup.(w) (h#k). (33) 


In this case the point x,,--- , x» is in R; if k is the index for which 
Q,P,(x) is a maximum, that is, 1 is the most probable population. 
If we do not have a priori probabilities, we cannot define an un- 
conditional expected loss for a classification procedure. Then we 
consider the maximum of the risk r(g,R) over all values of g. We 
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would like to choose R,, ---, Rm to minimize this maximum expected 
loss. 

It can be shown that the definition of the regions is made by 
using (31) where q:,--:, Gm are replaced by m positive numbers that 
make 


r(1,R) =---=r(mR). (34) 
This number is the expected loss.* 


7. Classification into One of Several Multivariate Normal Popula- 
tions. 

As an example of the theory let us consider the case of m multi- 
variate normal populations with the same set of variances and corre- 
lations. Let the mean of 2; in a, be u;\. Then 

Do (Xi, --+» Lp) p Pp 
gs 


lo =TAMs,— TAO (ws +s), (85) 
Dn (Xx e"" Xp) é=1 i=i 


where 4,°%", --- , 4°" are given as the solution to 





Pp 

D> 9% 9; pij Ag” = (U5 — wi). (36) 

j=1 
For the sake of simplicity we assume that the costs of misclassifica- 
tion are equal. If a priori probabilities, gq, , --- , gm, are known, the 
regions are defined by 


qh 
Ry: Ugn(%1,°--, Xp) > log —=loggq,—logq,, h=1,---;m, (37) 
I h#¥9Q, 
where Uyn(Z,,-+-, Zp) is (35). 
If a priori probabilities are not known, the regions are defined 
by 


Ry: Ugn (21 »°**, Lp) >A — Cy, h=1,---,m, (38) 
h#¥qQg, 
where the numbers ¢,, --- , Cm are to be determined so that 
P(1|1,R) =---=P(m\m,R). (39) 


To determine these constants we use the fact that if the observation 
is from 2, , U(X, +++, Xp), h=1,-++, mandh ¥g, have a joint 
normal distribution with means 


*The theory was first given for the case of equal costs of misclassification 
by R. von Mises (7). 
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Pp 
EUgr (x1 ps Lp) = 4 a Ayo) (ui snes wi) « (40) 
i=1 
The variance of ,,(%1,-*+, Zp) is twice (40), and the covariance be- 
tween Ugn(X%1, +++, Lp) ANA Uy (X1, +++, Lp) is 


> Ai (un, — py) = > Ai (uj; — ws), (41) 
w=1 t=1 


From this one can determine P(g|g,R) for any set of constants 
Cy, °°* » Cm. 

In any given case it would be exceedingly difficult to determine 
these constants so that (39) would be satisfied. Perhaps a reason- 
able practical procedure would be to let co, = +--+ = Gm = 1/m and then 
see whether (39) were approximately satisfied. However, even the 
computation of P(g|g,R) for given constants is usually far from 
easy. 

It should be pointed out that this procedure divides our space by 
means of hyperplanes. If p = 2 and m = 8, the division is by half- 
lines as in Figure 3. 


Xe R, 
@) ¥, 


(ft, fie 











a, Ae 
[ , 2 ) 
/ “fr? fia) 








FIGURE 3 


Finally, we remark that if the populations are unknown, we can 
estimate the parameters by means of samples, one from each popula- 
tion. If the samples are large enough, the above procedures can be 
used as if the parameters were known. 
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8. An Example of Classification into One of Several Multivariate 
Normal Populations. 


In (6) Rao considers three populations consisting of the Brah- 
min caste (2,), the Artisan caste (2.), and the Korwa caste (a;) of 
India. The measurements for each individual of a caste are stature 
(x,), sitting height (z.), nasal depth (2;), and nasal height (2,). 
The means of these variables in the three populations are 


Brahmin Artisan Korwa 


(m1) (22) (as) 
Stature (2) 164.51 160.53 158.17 
Sitting height (2.) 86.43 81.47 81.16 
Nasal depth (z;) 25.49 23.84 21.44 
Nasal height (2,) 51.24 48.62 46.72 


The matrix of correlations for all the populations is 


1.0000 .5849 .1774 .1974 
5849 1.0000 .2094 .2170 
1774 = .2094 1.0000 =.2910 | 
1974 .2170 .2910 1.0000 | 


ul 





The standard deviations are o, = 5.74, o. = 3.20, o; = 1.75, o, = 3.5° 
We assume that each population is normal. Our problem is to divide 
the space of the four variables x,, x2, x; , Xs into three regions of 
classification. We assume that the costs of misclassifications are 
equal. We shall find (i) a set of regions under the assumption that 
drawing a new observation from each population is equally likely 
(g; = Gd = gd: = 1/3) and (ii) a set of regions such that the largest 
probability of misclassification is minimized (the minimax solution). 


We first compute the coefficients 1;""?) and 4; defined by (36) ; 


the other /’s are obtained from the relations 14;°%" = — 1; and 
4 

Aj) = 4,0) — 1,%, After calculating ¥ 1; (ui + wi), we 
t=1 


obtain the “discriminant functions” given by* (35). 


*Due to an error in computations Rao’s discriminant functions (6) are in- 
correct. I am indebted to Mr. Peter Frank for assistance in the computations. 
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Ure (X1,%2,%3,X,) = — .0708x, + .49902. + .8373x, + .08872, + 43.18, 
Ung (X1,%2,%3,%,.) = .00038x, + .85502. + 1.1063x, + .18754, + 62.49, 
Uo3(X1,%2,%3,%,) = .OT11x, — .14402. + .76902; + .04882, + .19.36. 


The other three functions are t21(%1,%2,03,04) = —the (21,%2,%s,%), 
Us, (L1,HoyL3,L4) = Ung (X1,H%2,X3,%4), ANA Uge (X1,H%2,X3,0,) = Aes (X1,Ha, 
23,%,). If there are a priori probabilities and they are equal, the best 
set of regions of classification are Ry: U2 (%1,%2,%3,%4) 2 O, Uns (41,%2, 
ayX4) 20; Ret Wer (X1,%2,%3,%4) 2 0, Uses (X1,%2,%s,%,) 2 0; and Rs: Us (41, 
LoyX3,0) = O, Use (X1,%2,%3,%,) = 0. For example, if we obtain an in- 
dividual with measurements 2’, , 2's, “3, x’, such that te (2's,2'2,0's, 
x'4) = O and U3 (%'s,%'2,%'3,2',) 2 0 we classify him as a Brahmin. 

To find the probabilities of misclassification when an individual 
is drawn from population a, we need the means, variances, and co- 
variance of the proper pairs of w’s. They are 





Populationof w’s Means’ Standard Deviation Correlation 
% 1,20 3,X4 

Ny Ure 1.491 1.727 
.8658 

Urs 3.487 2.641 

Ne Uer 1.491 1.727 
.3894 

U3 1.031 1.436 

Ns Us1 3.487 2.641 
.7983 

Uso 1.491 1.436 


The probabilities of misclassification are then obtained by use of the 
tables for the bivariate normal distribution (5). These probabilities 
are .21 for 2,, .42 for a, and .25 for 2;. For example, if measure- 
ments are made on a Brahmin, the probability that he is classified 
as an Artisan or Korwa is .21. 


The minimax solution is obtained by finding the constants c¢,, 
C2, and c; for (38) so that the probabilities of misclassification are 
equal. The regions of classification are 


R's: Une (21,%2,05,%4) 2 54, Ws (21,%2,%s,%4) = 39; 
R's: Ues (1,250) 2— 54, Ung (Hr, M2,0a,04) 2 — 25; 
R's: Us: (%1,%2,%3,%,) 2 — .89 » Use (21,%2,%3,%4) = 25. 


The common probability of misclassification (to two decimal places) 
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is .30. Thus, the maximum probability of misclassification has been 
reduced from .42 to .30. 


Appendix: On the Distribution of the Classification Statistic. 


In (8) Wald considered the distributions of a class of statistics 
of which the classification statistic given in Section 5 is a special 
case. Wald showed that such a statistic can be written as a function 
of three quantities, say, m,, m2, m;. His expression for the dis- 
tribution of m,, m2, m; involved an expected value that was not 
evaluated. In this appendix we shall give this distribution of m, 
Mz, and ms, explicitly. 

Let the elements defined by (24) be si; and let (|sisl|> = {|s*7|] . 

p 
Then the sample estimate of 4; is 5 s‘/(#;° — #;®), and the sam- 
j=1 


ple equivalent of U is 


Dp 
y= b [xi —4(4;™ + £;°?) ] sii (4; — £;"). (42) 


4,j=1 


Let 





tina = V(N® + N@)/(N® + N® +1) (4;— 4), 





tine= V(NON@/(N® + N®) (4: — 4), 
where 
z= (NM4E,™ os NZ,) / (N® ze N°®)) andn—N” + N® —2, 


Then 





y= V (NY + N® + 1)/NON® W, 





+ V(ND—N®)/2NON® W., (43) 
where 


p 
W, — > bi n+18°4E; nse ’ 
i,j=1 
(44) 
Pp 
W, — Z bs n+28* Eb; nse . 


i,j=1 
W. is proportional to Hotelling’s generalized T? statistic for test- 
ing TP = TP : 

The set ti... and the set ti... have multivariate normal distribu- 
tions with variances o;* and correlations p;;. The set tinu, tine and 
8’ have the joint distribution assumed by Wald, and &t;..: is pro- 
portional to Et;,,... Wald expresses W, as 
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Mz 





n ’ (45) 
(1—m,) (1 — m,) — m,? 

where m,, M2, and m; are certain functions of $i; , tina, tine. It is 

easy to verify that 

Mm, — MM + M3? 





n R 
(1 — m,) (1 — m2) — m5? 
The joint density of m,, m2, and m; involves the expected value of 
Dp 
the determinant |> tivtj»|, where tj, have a certain normal distribu- 
v=1 


tion. This expected value which Wald did not evaluate is a special 
case* of (1). Thus we obtain as the density of m,, m., and m; 


K e724? (c#+k?) (mm, — m;”) 9) [ (1 — m,) (1 — M2) —_ mM; ]*(~P-1) 
s I (4n + » +1) a’ (k?m, + 2kems + c?m2)” 
v=0 I'(4p = v) y! 2¥ 

(m,m2— ms? = 0, (1—m,) (1— m2) —m,? 20,05 m,m <1), 


, (47) 





where 





k= VNON®/(N® + N®), 


c=N®/y(N® +N®) (N® + N® $1) 

if the observation is from 2, , and 
Cc =—N/vV/ (N® + N®) (N® + N® + 1) 

if the observation is from 2, , and K is a number chosen to make the 
integral of (47) over the entire range of m,, m., and m; equal to 
one (a is defined in Section 5). In principle the density of W can be 
obtained from (47) by integrating out two variables imposing (43), 
(45), and (46). 








*One needs to justify the expression in (1) for the non-central Wishart dis- 
tribution when v runs over the same range as i. This can be done by deriving 
the non-central Wishart distribution from the distribution of correlation coeffi- 
cients, x?-distributions, and a non-central x?-distribution. 
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9. 


10. 
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THE RELATIONSHIP BETWEEN THE METHOD OF SUCCES- 
SIVE RESIDUALS AND THE METHOD 
OF EXHAUSTION 


WILBUR L. LAYTON 
UNIVERSITY OF MINNESOTA 


The relationship between Horst’s method of successive residuals 
and Gengerelli’s method of exhaustion is demonstrated by trans- 
forming both methods into LZ notation. The Z notation form is much 
more efficient computationally. 


Horst (2), in 1934, presented the development of a method of 
item analysis called the method of successive residuals. Recently, 
Gengerelli (1) presented an “exhaustion” method for calculating re- 
gression coefficients. The present article proposes to show the rela- 
tionship between the two methods by transforming them both into L 
notation (3), computationally more efficient. 

In the method of successive residuals that item having the larg- 
est coefficient of correlation with the criterion is selected as the first 
item of a composite. That portion of the criterion which is predicted 
by the first selected item is subtracted from the criterion. This leaves 
a criterion residual. The next item selected is that item which has 
the highest coefficient of correlation with the criterion residual. That 
part of the criterion residual which is predicted by the second se- 
lected item is subtracted from the criterion residual. This leaves a 
second criterion residual. The third item selected is that item which 
has the highest coefficient of correlation with the second criterion 
residual. This process is continued in the same manner to select the 
n items to be included in the composite. 

Horst presents the general formula for a criterion residual as 


Ces(A-Xe + B.) =C. (1) 


where A, and B, have been determined by the method of least squares 
and C, indicates that part of the criterion which cannot be predicted 
by item e, i.e., C. is a criterion residual. 

The correlation between the criterion residual C. and the next 
item to be selected is given by Horst as 
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TCX, - T(C,_1 - A Xe) Xj, (2) 


where k is the item being considered for inclusion in the composite. 
His computational formula is 
des N> (Ce-+ ‘ihe A.Xe) Xx ad z (Co. mri A.Xe) Ni 


N oc, 7 (c,_,-4-X,)X; = 
= e k LEAT a 
e e-1 e VNN,— Ne 


where N; is the number of individuals who marked item k in a speci- 
fied manner and is consequently >.X; and }X;2. 
The gross score weight, A., is defined by Horst as 





N oc, TC ¢_1Xe 
A.= > (4) 
VNN.—Ne 





These two formulas provide a method for assigning differential 
gross score weights to items. 

Horst extended his method to provide a means for assigning 
positive or negative unit weights to items. In so doing, he defined 
the coefficient of correlation between successive criterion residuals 
and the successive items to be selected in terms of the original cri- 
terion rather than in terms of the criterion residuals themselves. 
This formula is 





N oc, To, ie 
ND[Co — Ace(+ Nurz....2 Nex) ] Xe — Ne LSCo— Ae(* Ni:....2 Ne) ] (5) 
VNN;.— Ni? 


where N., is the number of people answering both items e and k in 
a specified manner, and is therefore >X.X;,. By analogy we can 
write a similar formula in which differential item weights are in- 
volved as 

N Gc, Tc, 


(6) 


N>[C.— (A,X, t+ see + A.Xe) ] Xx — DX D[Co — (A,X, Hoeven H A.Xe)] 








VNSX? = (SX) 


It is obvious that formula (6) can be used to select tests for bat- 
teries as well as selecting items for tests. 


Formula (6) can be rewritten as 








NXC. 








WILBUR L. LAYTON 53 


N oC. Te. Xx = 








(7) 
NDC.X4 — NAs DXiXi, — +++ — NAD XX — LXKTCo + ArT XTX + +++ + ADXXy 
VN>Xi? — (2X;) 
Collecting terms, we find 
N oc, Tc, Ak aa 
(8) 
NSCoXE — DXKDCo — Ar (NLXXE — DXiTDXx) — +++ —Ae(NTXXe — VXeDAXi 








VNSXi? — (DXi) 
Transforming into L notation after Toops (3), we have 
Lox — Alix — +++» — Achex 


N 90, Tc,x, = os , (9) 





where 
Lis = NIX? — (LXi)? and Li; = NDXiX; — [XiTX;. 


It is clear that dividing (9) by Ly, we obtain the formula for the 
differential weight A; for item k as 


Lox —A,Ly Ser an eee 2 A.Le. 
i. e 


Formula (9) can be considered a general formula to be used in se- 
lecting successive items or tests and formula (10) as a general for- 
mula for determining the differential weight for an item or test after 
it has been selected. 

Let us now turn to Gengerelli’s method of exhaustion. The meth- 
od of exhaustion is used to determine successively the values of the 
regression coefficients by solving in turn Gengerelli’s formulas of 





A,= (10) 


the types 
B, = To1 ’ (11) 
Bs = To2 — Bite , and (12) 
Bs — Toe Batis = Booz . (13) 


Substituting (11) in (12) we have 
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Ww = 12 — 101112 » 
and substituting (i1) and (14) in (13) we have 


Bs = Tos — ToT 13 — (1'c2 — Tos1's2) Yes + 


(14) 


(15) 


Transforming £,, 62 and f; into gross score weights from formulas 


(11), (14), and (15) we have 


do 














b; =T101 >» (16) 
G1 
Go do 
be = Toe2 — — Tati —> (17) 
C2 G2 
and 
To Jo Jo 
bs = 103 — — To17'13 — — iv. To%12) 2s —+ (18) 
C3 C3 C3 
Now 
Liy 
ing tee nmscenrciten, (19) 
V Lyy V Lxx 
and 
me (20) 
ox — 2 
x N? 
Therefore 
Tee 
lon N? 
,=——— (21) 
V Loo V Las has 
N? 
‘Le ‘be 
b ee | N? J oe Tine N? 
=——————- oma —— meena 6 — ; 
V Loo V Le Loo V Loo VL VEu VL Lee 
N? N? 
(22) 


and 








Th 
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Tes - 
b wT Los N? Lo Lis N? 
co aes eee = ee eee 
V Loo VIss Lis VL VI Vin V Liss Las 
N? ne 
(23) 
‘Tn 
Tae Lo Ly. Les N?2 
( VinVin ViIavin VLuVvIn ViVi TI 
| Ne 
Simplifying 
b sae (24) 
1 aig 
Lo. 
Lez aa ae Ly. 
b= nt (25) 
2 i ? 
and 
La Lee Lox Liz 
Los Cone? eer See Se soe are 3 
11 Le Lu Le 
b= ‘ (26) 
Liss 
Then: 
Loe — 6: Ln2 
je, (27) 
Le 
and 
L es b.L, sip boLe 
: ~— 03 3 a (28) 
Ls 
In general, 
Lon iia b,L — Te Satie Dn-1 La- " 
b. = - a nl (29) 


It is apparent that formula (29) based on Gengerelli’s method 
of exhaustion is identical with formula (10) based on an extension 
of Horst’s method of successive residuals. 
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Let us now return to formulas (9) 
Lox a A,Li: sill jacinta eae AeLex 


Vis 





N Gc, T0,.%; — > 


and (10) 
Lox race, A, Li Pree Sree ee A.Le 
7” 


In L notation, the formula for the coefficient of correlation between 
the criterion and the composite of selected tests or items becomes 





A,;= 


Oth % 4A, 256... 442) 
(30) 
A, Ly + Adon = Fe ore we +A Loe 








VD VAL + As*Lee +++: + AZLee + 2(SAiA;Lij) 


It will be noted that the numerator increases accumulatively from 
selection to selection by the amount A, L,. and that the right hand 
element of the denominator increases accumulatively by the amount 
A? Lee + 2S:A-A;jL-;. It is obvious that formula (10) is merely for- 
mula (9) divided by Vly. Formula (9) makes for easy computa- 
tion since the numerator accumulates from each item or test selected 
to the next. 

Further, if one is dealing with large numbers of items or tests the 
Li; of a specific item or test with all the other variables being con- 
sidered need not be computed until the variable has definitely been 
selected as part of the composite. Punched card methods aid in this 
since the sums necessary for the computation of the L’s are quickly 
run off on an IBM tabulatoer and summary punch. If one is dealing 
with test items, the IBM sorter will deliver the }X;X; of the selected 
item with other items from multiple-punched cards very rapidly. 
From the obtained sums one can compute L’s at a fast rate using a 
calculator which has automatic positive and negative multiplication. 


REFERENCES 
1. Gengerelli, J. A. A simplified method of approximating multiple regression 
coefficients. Psychometrika, 1948, 13, 135-146. 
2. Horst, Paul. Item analysis by the method of successive residuals. Journal 
of Experimental Education, 1934, 2, 254-268. 
8. Toops, H. A. The L-Method. Psychometrika, 1941, 6, 249-266. 
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THE RELATIONSHIP BETWEEN THE VALIDITY OF A 
SINGLE TEST AND ITS CONTRIBUTION TO THE 
PREDICTIVE EFFICIENCY OF A 
TEST BATTERY 


PAUL HORST 
EDUCATIONAL TESTING SERVICE 


Typical selection or classification testing programs should pro- 
vide for improvement of the predictive efficiency of the test battery. 
Such provision calls for the administration of experimental tests 
along with the operational battery administration and follow-up 
analysis to determine the value of the experimental material. It is 
possible to determine without waiting for criterion data what the 
validity of the experimental test must be in order to improve the 
battery validity. The method together with the proof is presented. 


I. The Method 

Any serious program concerned with the prediction of criterion 
measures whether in industry, education, the military services, or 
elsewhere will ordinarily utilize a battery of prediction measures 
rather than a single measure. Furthermore it is urged that for a 
given criterion, the scores on the prediction measures be combined 
into a single predicted criterion score by means of “least square” re- 
gression weights. An adequate testing program should also incor- 
porate within its operational and administrative framework provi- 
sion for the improvement of the prediction battery in terms of its 
multiple correlation with the criterion. The most obvious method for 
improving the battery is to add a test which will improve the mul- 
tiple correlation. But to find whether a test will improve the mul- 
tiple correlation, it is necessary to have its correlations with all the 
other tests in the battery and also its correlation with the criterion. 
In many cases the collection of criterion data and its collation with 
test data is a costly and time-consuming process. Often it is neces- 
sary to wait for months or even years after test data are obtained 
before criterion data on the same cases become available. This is 
true when tests are given to candidates for training or applicants 
for jobs in which success cannot be adequately determined until 
after months or years of performance. 
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In such cases it may be very useful to know soon after scores 
on the experimental test become available what the validity of the 
test would have to be in order for the test to make a specified in- 
crease in battery validity. If it turns out that the validity must be 
a value which is highly improbable of attainment, then one may 
better forego the time and effort of collecting validity data later on 
and proceed at once to the development and tryout of other test ma- 
terial. Fortunately, if we know the correlation of an experimental 
test with each test in the predictive battery, it is possible to deter- 
mine what its validity must be in order to increase the battery va- 
lidity by a specified amount. 

The formula for determining this required validity coefficient is 
given by: 





Ta = Trsher = Vala t+ 2h ..1) (1— R*x.t), (A) 
where 


a =the specified increase in the multiple correlation, 

Y., =the validity required of test k to achieve the increase a, 

R.. =the multiple correlation of the test battery, excluding 
test k , with the criterion, 


Tx, =the correlation of test k with the predicted criterion 
score derived from the test battery with test k excluded, 
and 


R,.. =the multiple correlation of test k with the test battery. 


Formula A requires the multiple correlation of the new test 
with the battery. It also requires its correlation with the predicted 
criterion scores. If we are willing to take the latter as a sufficiently 
close estimate of the former, we can write Formula A thus: 





Vk Trlr c.t = Vala = 2h.) (1 sie oe (B) 


It will be noted that Formulas A and B yield two values for ra, 
one for the plus sign and one for the minus sign. It will also be noted 
that, if ra has the value indicated by the minus sign, test k will 
have a negative regression weight, while if ra has the value indi- 
cated by the plus sign, the regression weight will be positive. In the 
former case the test will serve as a suppression test. 


The value of R,.; will, in general, be greater than that of 7%.. 
For any specified increase a in the multiple correlation, Formula B 
will, therefore, overestimate the value required of ra if the plus sign 
is used and underestimate it if the minus sign is used. Formula B, 
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however, is not very sensitive to variations of 7,; under the radical. 
For example, let us assume that R..;=.50 , 7xs=.40 , and a=.02. The 
required validity, ra, of test k would then be .331 for the plus sign 
and .069 for the minus sign. Suppose now, that instead of using 
xs, or .40, in the radical, we had used the correct value, Rx: , the 
multiple correlation of the test with the battery. This value would, 
in general, be greater than .40. Assuming it to be .50, Formula A 
would give .324 as the required validity of test & for the plus sign 
and .076 for the minus sign. These values differ by less than one in 
the second decimal place from the approximate values given by For- 
mula B. 

in practice, Formula B is much easier to use than Formula A, 
since it requires neither the correlation of the experimental test with 
each test in the battery nor the multiple correlation of the test with 
the battery. Formula b is the equivalent of treating the existing 
battery as one variable and the new test as a second variable. That 
is, we ask what the validity of a new test must be if it is to make a 
specific addition to the present battery as it is currently weighted, 
knowing the correlation of the new test with the battery. Formula 
A differs from Formula B in that it permits the present weights of 
tests in the battery to be changed to the actual regression weights, 
including the new test, whereas Formula B uses the existing weights 
for tests already in the battery. 

To illustrate the use of the formula let us assume that we have 
a battery of tests whose multiple correlation with the criterion is 
.50. This value is fairly typical of the multiple correlations between 
test batteries and a wide variety of academic and industrial criteria. 
Substituting .50 for R..; in Formula B we have 





T ck = Oree + Va(at+1)(1—7xs). (C) 


Table 1, based on Formula C, has been prepared to indicate for 
various values of a and 7%, the required value of ru. For example, 
suppose the experimental test correlates .35 with predicted criterion 
score and that, in order to justify its inclusion in the battery, it is 
Specified that it shall increase the multiple correlation of .50 by at 
least .02. In the rx: column at the left we find the value, .35; and, 
following this row to the column headed “‘.02,”’ we read the values, 
809 and .041. Therefore, if a test correlates .35 with the predicted 
criterion score and is to increase the multiple correlation by .02, its 
validity must lie either above or below the range, .041 to .309. Simi- 
larly, for a predicted criterion correlation of .45 and an increase of 
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.03 in the multiple correlation, the validity must be above or below 


the range, .068 to .382. 


The Validity Required of Test k in Terms of its Correlation with Predicted 
Criterion Score and of Desired Increase in Multiple Correlation 
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TABLE 1 


Validity Required 


if R of .50 is to be increased by 
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It should be pointed out that a test might be considered for the 
battery even though it does not increase the predictive value of the 
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current battery. For example, the new test may be simpler in con- 
struction or require less testing time than one or more of the tests 
already in the battery. It might then be possible to substitute the 
new test for such tests without lowering the predictive value of the 
test battery. Thus, economy of time and effort would be achieved 
without loss of predictive efficiency. 


Il. Proof of the Method 
The proof of Formula A may be developed as follows: 


Let 

y =the matrix of intercorrelations of the tests in the battery 
exclusive of test k, 

p =the matrix of intercorrelations of the tests in the battery 
including test k, 

7. =the vector of correlations of the tests in the battery, ex- 
clusive of test k, with the criterion, 

pe the vector of correlations of the tests in the battery, in- 
cluding test k, with the criterion, 

Y, =the vector of correlations of test k with the other tests 
in the battery, 

8 =the vector of regression weights of the tests in the bat- 
tery, including test k, with the criterion, 

8. =the vector of regression weights of the tests in the bat- 
tery, exclusive of test k, with the criterion, 

fb, =the vector of regression weights of the tests in the bat- 
tery with test k, 

1, =the correlation of test k with the criterion, 


R.. —the multiple correlation of the tests in the battery, ex- 
clusive of test k, with the criterion, 


R.., =the multiple correlation of the tests in the battery, in- 
cluding test k , with the criterion, and 


Ry... =the multiple correlation of test / with the other tests in 
the battery. 


From the above definitions, we have the relations 
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Le | 
Pc = | | 
bi J 


From the definitions and well-known formulas, we have 


p*pe=B. 

p cP pe — Ror. 
p cB = R*.7. 
rr, = Be. 


” rr. = R*.;. 
1 Be — R*o+ ° 
rs = Br ° 
1407, = Rx. 


Yer = FR, . 


From (1) we have 


=f 


= 


T T (| | $s U 
_ =. | 
|e 1 | | wu d | 
From (13) we have 
| Tr lk S Uu | I 0 
== | 
re 1 w sod lo .1 


Expanding the left-hand side of (14) gives 
rtrus—I. 
mut+rd=0. 


rxs t+t+u=—0'. 


ryutd=1. 





(2) 


(3) 
(4) 
(5) 
(6) 
(7) 
(8) 
(9) 
(10) 
(11) 


(12) 


(13) 


(14) 


(15) 
(16) 
(17) 
(18) 
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Premultiplying (15) by 7? and transposing, 
s=ri—r'nw . 
Substituting (19) in (17) and solving for uw, 
— ie 


———————, 
1—rir rn, 


Substituting (20) in (19) gives 
ry a 
s=r3 + ———_.. 
1— ie” ie 
Substituting (20) in (18) and solving ford, 
1 


nnn, 
1—?r,.rr, 
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(19) 


(20) 


(21) 


(22) 


Substituting (9) and (10) in (21), (20), and (22), respectively, 





sare + ie 
1— F*,. 
a 
is — R*,.: 
i 1 
1 Rae 
From (12), (13), (23), (24), and (25), we have 
1 || (L— B%2) 77 + BiB —Pr | 
oS | —p', 1 | 





From (2), (3), (6), and (26), 














1 (1— Ry) + Bib, = —Br | | Te 
ca 1— F's Pr 1 | | Tek 
or, 
1 | AA RuD Be + Bure fire | 
sine 1— R*,.: | —B'xte + Tex | 





(23) 


(24) 


(25) 


(26) 


(27) 
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From (2), (5), and (27), 


1 (1 — R*x.1) Be + Bape — BrV cx | 
oT a (7 osc) | | ? 
1— Ri. Bite + te | 
or, 
1 
t..=———— 
1— FP 


x (« as R*,.+) 1" Be a (1° Br)? —1 Br x ae Bu cNex a ra) ° (28) 


Substituting from (8) in (28) and simplifying, 


(1 —— 20 * 
R?o.7 = Ro. + : : , (29) 
i otitis RR, 





Solving (29) for ra, we have 


a= of + V (R'er — Re) (1 — R*%.1). (30) 





Formula (30) gives explicitly the value which rz must take to 
yield a specified increase in the multiple correlation coefficient. For 
computational convenience, the right-hand side may be written some- 
what differently. The term 7’.f, implies the computation of the mul- 
tiple regression coefficients for predicting test k from the test bat- 
tery. This is not necessary if we have given predicted criterion 
scores from the test battery with test k excluded. 


Let: 
z =the matrix of standard test scores, excluding test k, 
K =the vector of standard scores for test k, 
C =the vector of standard criterion scores, 


C, =the vector of predicted criterion scores when test k is 
excluded from the battery, and 


rs -=the correlation of test k with the predicted criterion 
scores. 


By definition, 


zBc=C,. (31) 
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The correlation of test & with the predicted criterion score is 








1 
— K'C, 
N 
Tks = . (32) 
1 
| do, 
N 
Substituting (31) in (32), 
1 
— K'zp, 
N 
ks = . 
fi , = (33) 
— Pc& ZPe 
\N 
By definition, 
1 
— K'z=r'x. (34) 
N 
1 
—272z=r. (35) 


Substituting (34) and (35) in (33), 








Be 
~=—_—. (26) 
VB crBe 
Substituting from (6), (7), and (9) in (86), 
Tc 
aoe (37) 
Ret 
Solving (37) for 7'cfx , 
1" BK = Ret? ks ’ (38) 


which may be substituted in (30). 
Suppose now, we prefer to specify a desired increment in the 


multiple correlation rather than its square. If we indicate this in- 
crement by a, we have 


Ret + = Ret ? (39) 
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or 
R?..4 aa RP. st a(a + 2R..1) . 


Substituting from (38) and (40) in (39) gives 





ta isle c.t + Vala - 2.2) (1 — Ks), 


which is the same as Formula A. 
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AN EMPIRICAL VERIFICATION OF THE WHERRY-GAYLORD 
ITERATIVE FACTOR ANALYSIS PROCEDURE* 


ROBERT J. WHERRY 
OHIO STATE UNIVERSITY 


JOEL T. CAMPBELL 
PERSONNEL RESEARCH SECTION, ADJUTANT GENERAL’S OFFICE 


AND 


ROBERT PERLOFF 
OHIO STATE UNIVERSITY 


A comparison of the Wherry-Gaylord iterative factor analysis 
procedure and the Thurstone multiple-group analysis of sub-tests 
shows that the two methods result in the same factors. The 
Wherry-Gaylord method has the advantage of giving factor load- 
ings for items. The number of iterations needed can be reduced by 
doing a factor analysis of sub-tests, re-grouping sub-tests according 
to factors, and using each group as a starting point for iterations. 


Wherry and Gaylord} proposed an iterative method of factor 
analysis which identified the factor structure of the test and gave 
factor loadings for each item, but which did not require item inter- 
correlations. This method has been empirically verified in factor 
analyzing an officer-qualification check list. Comparison of results 
from the Wherry-Gaylord with those from a Thurstone-type mul- 
tiple-group analysis of sub-tests showed that the two methods give 
identical factors after rotation first to orthogonality and then for 
meaningfulness. 


The analysis was based on ratings of each of 231 Regular Army 
officers by his immediate superiors. The check list used had 289 


*This research was carried out under Contract No. WSW-2508, between the 
Department of the Army and Ohio State University. This paper is based on the 
final report PRS No. 827 under that contract. The opinions expressed herein 
regarding matters relating to the Department of the Army are those of the 
authors and are not necessarily official. 


; tWherry, Robert J., and Gaylord, Richard H. The concept of test and item 
reliability in relation to factor patterns. Psychometrika, 19438, 8, 247-264. 
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items, each of which was marked on a five-point scale. For the 
analysis, unfavorable items were reflected, and each item was dichot- 
omized as close as possible to the 50 per cent level of difficulty. 


The original Wherry-Gaylord iterative analysis called for (1) 
computation of item-test coefficients, (2) grouping of items with 
highest coefticients into a new “test,” (3) computation of coefficients 
between each item and the new “test,” (4) addition of items whose 
coefiicients increased and dropping of items whose coefficients de- 
creased. These steps are repeated until stability is reached. Then 
those items which had been rejected are formed into a new “test,” 
item-test coefficients are computed, and the procedure is repeated as 
many times as necessary. 


Since the Officer Qualification Form had 289 items, it was felt 
that pre-sorting of items into “factor” piles might reduce the num- 
ber of iterations if the number of factors turned out to be at all 
large. Accordingly, the items were sorted into 13 groups according 
to the following categories: ability, attitude toward work, efficient 
use of subordinates, force, general cultural level, knowledge of pro- 
fession, military appearance, morality, originality, performance, re- 
lation to subordinates, relation to superiors, and sociability. There 
were from 8 to 48 items in each group. 


Each of the 13 sub-tests was used in turn as a starting point for 
iterative analysis. The item-test coefficients computed were tetra- 
chorics between marked-high—marked-low on an item and upper-half 
—lower-half in sub-test score. After approximately 4 iterations in 
each case, it was found that the items selected on the 13 scales fell 
roughly into three groups or patterns. Final sub-test scores for the 
13 iterated scales were intercorrelated and each test was found to 
correlate at least .98 with every other test in one of the three groups 
or patterns. 


Two further steps were taken to see whether other factors were 
obtainable. First, all items not appearing in the 13 final sub-test 
scales were used as a 14th scale. Iteration of this group quickly re- 
sulted in another scale duplicating Group III. Secondly, an examina- 
tion was made of factor loadings and all items with loadings less 
than .30 on any of the three groups were selected as a 15th scale. 
Iteration of this group of items resulted in a sub-test which con- 
tained several new items. It was also tending to iterate toward one 
of the three groups, but iteration was stopped before it reached that 
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stage, and loadings in this test were used to represent a Group IV 
factor. 


In order to obtain correlations among the four factors, items 
with high loadings on each scale were selected as a test of that fac- 
tor. The subjects were scored on these sub-tests and correlations 
obtained between the sub-test scores. Using these correlations in 
lieu of actual inter-factor correlations, a transformation matrix was 
secured and applied to the item sub-test coefficients to obtain orthog- 
onal factor loadings. About 20 of the 289 resulting communalities 
were above 1.00. Examination showed that in every case these were 
items used in defining the final sub-tests, resulting in upward con- 
tamination of their factor loadings. This contamination (contamination 
= /1/n) was removed, and the transformation matrix reapplied. 
This reduced all except 6 items to a communality of less than 1.00. 
For these remaining items a proportional decrement across the load- 
ings was used to reduce the communalities to unity. 


After orthogonality and communalities of unity or below were 
achieved, the four factors were rotated for meaningtfulness. 


The large number of items made complete plotting infeasible 
since the dots could not be labeled and seen. This necessitated se- 
lection of only the highest positive and negative combinations for 
plotting and the securing of actual rotated loadings mathematically. 
Each succeeding transformation matrix was superimposed on the 
preceding transformation matrix, and the resulting matrix applied 
to the original oblique loadings to secure the final orthogonal ro- 
tated loadings. 


Descriptions of the four factors: 
Factor I had its highest loadings on the following 10 items: 


90 280. No attempt to help others (R)* 

89 155. Lacking in sincerity (R) 

82 207. Does not secure loyalty of subordinates (R) 
80 171. Feels mistreated (R) 

-80 107. Hated by subordinates (R) 

-79 139. Harbors grudges (R) 

-78 1380. Not cooperative (R) 

.78 158. Caustic in remarks (R) 

-78 204. Cannot apply knowledge (R) 

-78 2387. Selfish in motives (R) 


*“R” indicated item was reflected in scoring. 
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Reduced to a description, we have an officer who “‘is sincere, help- 
ful, cooperative, and satisfied and who engenders liking and respect.” 
This factor seems best described as Proper Attitude Toward the Job. 


Factor [i had its highest loadings on the following 10 items: 


92 
91 


42. 
100. 
263. 

19. 

50. 

53. 

68. 

70. 

16. 

20. 


Establishes cordial relations 

Well liked by fellow officers 

Knows his subordinates 

Affable and genial 

Makes duty assignments according to ability 
Gets along well with subordinates and superiors 
Pleasing personality 

Has vitality 

Assigns men properly 

Physical endurance 


These items seem to describe an officer who “is genial, cordial, 
and well liked by subordinates, fellow officers, and superiors, and 
who handles his relationships with them in a satisfactory manner.” 
This factor is therefore identified as Successful Interpersonal Rela- 


tionships. 


Factor III has its ten highest loadings on the following items: 


72 
68 
.68 
65 
64 
63 
62 
61 
—.60 
-60 


182. 
119. 
250. 
ag7. 
265. 

93. 

60. 

29. 
225. 
249. 


Makes bold and quick decisions 
Lacks ability (R) 

Has little force (R) 

Fails to exercise initiative (R) 
Forceful 

Good leader 

Needs to assert himself (R) 
Physically unimpressive (R) 
Quiet 

Timid (R) 


Boiled down to a thumbnail description, the items say such an 
officer “is bold, forceful, and quick to lead or take the initiative; 
never quiet, timid, or afraid to assert himself.” This factor is there- 
fore identified as Forceful Leadership and Initiative. 


The ten highest loadings on Factor IV were 


49 
-76 
75 
74 
73 
78 


12. 
260. 
241. 
239. 
201. 
284. 


Does not know his job (R) 

Competent 

Persevering 

Mentally alert 

Makes little progress toward objectives (R) 
Criticizes superiors in front of junior officers (R) 
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-70 281. Not well informed concerning his duties (R) 
.69 165. Ignorant (R) 

.69 199. Shirks responsibility (R) 

.66 65. Seeks easiest assignments (R) 


These items clearly depict an officer who “is competent, alert, 
informed, and persevering; one who gets things done and likes to do 
them.” This factor is therefore identified as Job Competence and 
Performance. 


Since it seemed that four factors were quite few to describe 
almost 300 items, and since the original thirteen sub-tests had “‘ap- 
peared” to be different, it seemed wise to run a factor analysis of the 
scores on the 13 selected original sub-tests. 


Table 1 shows the intercorrelations among the 13 original sub- 
tests before iteration. A group factor analysis* yielded 4 factors. 
Rotated first to orthogonality and then for meaningfulness, the load- 
ings are those shown in Table 2, with residuals in the upper half of 
Table 1. 


Factor I had its highest loadings on 


.69 12. Relation to Superiors 
58 2. Attitude toward work 
54 8. Morality 

29 861. Ability 

325 10. Performance 


This factor is quite easily identified as being the same as Factor 
I from the iteration process, namely Proper Attitude Toward the Job. 


Factor II had its highest loadings on 


61 11. Relation to Subordinates 

60 38. Efficient use of Subordinates 
50 7. Military appearance 

57 18. Sociability 

44 9. Originality 

41 8. Morality 


Again the factor is clearly identifiable as the Factor II from the 
iterative process, namely Successful Interpersonal Relationships. 


Factor III had its highest loadings on 


-75 7. Military appearance 
57 1. Ability 


*Thurstone, L. L. Multiple Factor Analysis. Chicago: Univ. of Chicago 
Press, 1947. 








72 PSYCHOMETRIKA 


438 4. Force 

29 6. Knowledge of Profession 

28 9. Originality 

28 5. General Cultural Level 

.38 8. Efficient Use of Subordinates 
.25 10. Performance 


Inspection of both the topics and items contained in Factor III 
from the iterative process (the 10 highest items came from sub-tests 
1, 4, 7, 9, 10, 11, and 13) indicate the identity of this factor with that 
one. It is accordingly labelled Forceful Leadership and Initiative. 


Factor IV had its highest loadings on 


-77 10. Performance 

75 6. Knowledge of Profession 
74 9. Originality 

.67 11. Relation to subordinates 
63 2. Attitude toward work 
1 4, Force 

o7 1. Ability 


Again this factor is immediately seen to correspond to Factor 
IV on the interative process namely, Job Competence and Perform- 
ance. 


This finding clearly points out that such initial groupings, factor 
analyzed, and regrouped according to factors, rather than by topics, 
would form better initial breakdowns for starting the iterative pro- 
cedure, and lead to further time saving in that process. For exam- 
ple, in this case, the four factor divisions of sub-tests would have cut 
the iterative sequences from 13 to 4 and would have led more quickly 
and with equal accuracy to the same result. 


Both the Wherry-Gaylord iterative analysis and the Thurstone 
group method, however, gave the same factors. 


The effectiveness of the suggested use of sub-tests as a short 
cut will obviously be a function of the homogeneity of the sub-tests 
editorially selected. The consequences of varying degrees of homo- 
geneity are indicated below: 


(a) If completely heterogeneous (implying no success in edi- 
torial judgment), factor analysis would yield one factor and we 
would thus have the original Wherry-Gaylord suggestion of using 
total test score. In this instance, nothing would be gained by using 











ROBERT J. WHERRY, JOEL T. CAMPBELL, AND ROBERT PERLOFF 73 


the suggested short cut, while the only loss would be in terms of time 
spent. 


(b) If partly homogeneous with some heterogeneity (implying 
some editorial success) we would expect results similar to those in 
the present study. 


(c) If completely homogeneous within a sub-test but heteroge- 
neous with respect to certain other sub-tests (better editorial acumen 
than that represented by the present study) the factor analysis would 
yield our findings in clearer form (less high loadings on other fac- 
tors for sub-tests). This would be nicer looking, but would make no 
practical difference. 


(d) If each test was completely homogeneous both internally 
and with respect to the other subtests (implying perfect editorial 
ability), factor analysis would yield no common factors and would 
indicate that each sub-test is already an independent factor. (Note: 
If the same is true of a single sub-test, it will not have loadings on 
any of the common factors found, but would be included as a sepa- 
rate factor.) 

Manuscript received 5/15/50 
Revised manuscript received 8/10/50 


TABLE 1 


Intercorrelations and Residuals for 13 “Guessed” Factors* 





Sub-Test i 2 8) ae & Be Fe Bi oh 16: 1s aes 
1 Ability — 00-02 -03 05 01-06 03 00 00 01 -03 -01 
2 Attitude 72 — 02 05 -04 -01 06-038 01 O01 O1 -01 -03 
5 Efficient Use of 

Subordinates 57 61 — 01-05 04-09 00-03 07 09 02 00 
4 Force 64 68 72 —-06 -01 07 02-01 00 00 04 -03 
5 General Cultural Level 68 51 52 53 — 00 038 O1 02-05 -04 O1 04 
6 Knowledge of 

Profession 76 72 71 #73 61 — 00 00 -01 -03 -02 -03 -02 
7 Military Appearance 43 41 56 65 49 47 — 01-07 08 00 03 -05 
8 Morality 60 71 58 59 49 61 387 —-01-04 01-02 04 
9 Originality 71 75 75 79 66 81 52 68 — 03-07 O07 00 
10 Performance 79 82 738 75 57 80 51 64 86 — -—01 -02 -06 
11 Relation to 

Subordinates 57 71 88 70 52 69 47 71 74 73 — 04 -02 
12 Relation to Superiors 64 81 56 60 51 63 38 73 69 73 71 — -03 
13 Sociability 42 58 59 51 47 51 388 67 82 51 70 59 — 


; *Intercorrelations are shown below the principal diagonal, and residuals above. The decimal 
point has been omitted in all entries. 
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TABLE 2 


Factor Loadings* 





I II III IV 

Conscientious Personal Force and Job 
Sub-Test Attitude Relations Initiative Performance h2 
1 39 08 57 57 81 
2 58 23 21 63 3 
3 11 60 38 49 76 
4 12 39 43 61 72 
5 17 27 38 49 49 
6 22 22 39 75 81 
7 03 58 66 10 78 
8 54 41 09 49 W1 
9 10 44 38 74 90 
10 35 20 85 77 88 
11 22 61 07 67 87 
12 69 25 12 54 84 
13 3¢ 57 —01 45 64 


*The decimal point has been omitted from all entries. 
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THE CENTRAL INTELLECTIVE FACTOR* 
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The proof of the existence of “g” is more than a methodologi- 
cal problem and concerns the very core of psychological theory. The 
principles of noegenesis should be identified experimentally before 
a final opinion can be rendered about “g.” Many general factors 
isolated in different studies are not necessarily “g.” In the present 
study a second-order unrotated general factor has been identified 
by using Thurstone’s method. It seems possible to identify this 
factor with “g.” In the first order, factors that seem to represent 
the first and second principles of noegenesis have been found. The 
existence of synthetic and analytic activities and their interplay 
in intellectual performances is indicated. The relation of likeness 
is of great interest in explaining cognitive abilities and is isolated 
both as a first and second order factor. For the final identification 
of factors the search should be conducted beyond the elementary 
listing of tests. The dynamic aspects underlying factors are more 
meaningful than their simple description. The second order gives 
indications that allow for a better interpretation of fundamental 
psychological activities. 


Introduction 

Since Spearman defined “g” many factorial studies have been 
published in relation to intelligence. No satisfactory agreement has 
yet been reached on issues germane to the problem of the nature of 
intelligence, in spite of the different methods and tests tried. Some 
factorialists have often been satisfied with the simple enumeration 
of variables. Nevertheless, factor analysis has more to its credit 
than merely cataloguing factors. 

Spearman’s interest was to delimit and define “g” as a general 
factor best expressed by the principles of noegenesis} and by what 


*The final part of this study was carried on at the Psychometric Labora- 
tory, University of Chicago, during the year 1946-47. This part of the research 
was completed under a State Department Grant and a Frank Fund Fellowship. 

The author is indebted to Dr. L. L. Thurstone for his assistance and to Mr. 
V. S. Tracht for his help in preparing the manuscript. 

tAs described by Spearman the principles of noegenesis refer to: 1) “a 
person has more or less power to observe what goes on in his own mind,” 2) 
“when a person has any two or more ideas (using this word to embrace any 
items of mental content, whether perceived or thought of), he has more or less 
power to bring to mind any relations that essentially hold between them,” and 
8) “when a person has in mind any idea together with a relation, he has more 
er less power to bring up into mind the correlative idea” (27). The theoretical 
postulation and a complete discussion of these three principles is found in The 
Nature of Intelligence and the Principles of Cognition (25). 
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he calls abstraction, adding that the analytic procedures “tend to 
load noegenetic processes with ‘g’” (30). His interest in other fac- 
tors was secondary and he cautioned against those conditions that 
would disturb the tetrad criterion and make “g” disappear. Proof 
of the existence of “g” were given by Holzinger (17) and Brown 
(6), among others. 

Some British psychologists and Spearman himself spoke of 
overlapping factors, and in this endeavor mainly the verbal and the 
space factors were accordingly defined (31, 32, 33, 13). “g” was in- 
terpreted as “general fund of energy,” will power, maturation, neu- 
ral plasticity, condition in the blood, neural energy, chance, and so 
on (16, 27, 30). 

In Thurstone’s theory the existence or non-existence of a gen- 
eral factor is not previously postulated. The main interest is to dis- 
cover the number and properties of the factors that reproduce the 
given raw data, mainly the correlational matrix (38). As several 
experimental studies have shown, if there is a general factor it will 
become evident by using Thurstone’s method (4, 5, 15, 36, 39). 

The early procedure of rotating the factors while keeping them 
orthogonal was modified by introducing oblique factors. These are 
linearly independent but statistically related. The study of the sec- 
ond order—that is the analysis of the correlations between the pri- 
maries—has been little explored up to the present. The statistical 
and methodological implications have been partially discussed (24, 
36, 38), and probably it is in the second order where the interaction 
of primary factors is to be further clarified. Nevertheless, any sec- 
ond order findings should be carefully interpreted on account of our 
lack of experience and of the theoretical and methodological assump- 
tions involved. 

Thurstone (35, 36) described several primary factors and his 
pupils and associates in a series of different studies confirmed their 
existence and characteristics. Some of these factors have been lately 
split into several others, or their properties redefined, for instance 
the perceptual and the space factors, etc. (37). 

It is customary to present the results obtained by Thurstone 
et al. as incompatible with those reached by Spearman and associ- 
ates. Since our problem is related to this, we shall review some of 
the pertinent bibliography on the subject. 

Blakey (4), using Thurstone’s method, reworked Brown and 
Stephenson’s data and found a verbal, a space, and a perceptual speed 
factor, plus another variable that may be interpreted as Spearman’s 
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central intellective factor or as the effect of maturation on the sub- 
jects. In a study of non-verbal tests the same author (5) reports a 
factor which is general for the special battery employed. 


A general factor and others were also found by Wright (39) in 
analyzing the Stanford Binet scale. This author believes that this 
factor is due to the effects of maturation rather than “g.” This in- 
terpretation was criticized by Burt and John (10) who, by analyzing 
the Terman Binet scale, discovered a general intelligence factor which 
may “be regarded as indicating the particular characteristics which 
the tests were designed to measure.” Whether this interpretation is 
accepted or not, it is related to Burt’s theoretical position, wherein 
factors are mainly principles of classification (9, 11). 

Thurstone (36) reported that the correlations among the pri- 
mary factors indicate that “each of the primary factors can be re- 
garded as a composite of an independent primary factor and a gen- 
eral factor which it shares with other primaries.” Swineford (34) 
stated that this second-order general factor is similar to Holzinger’s 
general factor. 

Carroll (12) described a second-order general factor common 
to all the primaries, and Goodman (14) reported a similar result. 
Mellone (20) also described a general factor—present in 7 years old 
boys and girls—by using Thurstone’s method and orthogonal rota- 
tions, and Balinsky (2) found a general factor, which the author 
calls “g” at 9 and 50-59 years of age. 

Alexander (1) reports several factors, one of them common 
to all the tests in the battery. But Alexander’s criterion for rotation 
was to pass an axis through a cluster of tests previously interpreted 
as tests of “g.”’ The lack of an independent criterion for the deter- 
mination of “g’” makes his conclusions less valid than otherwise. 
When Yela (40) analyzed Alexander’s correlational matrix by means 
of Thurstone’s method, he did not find a general factor in the first 
order. The second order indicates the existence of a general factor 
common to the cognitive abilities described in the first order. 

It is interesting to report that Spearman (28) in reworking 
Thurstone’s data, found a general factor plus a Verbal (V), a Space 
(S), a Number (N), and a Memory (M) factor. Induction (J), 
Reasoning (R), and Deduction (D) did not appear in the results. 

Reyburn and Taylor (22) advanced the idea that “g” might not 
represent a single factor. 

A summary of these paragraphs indicates: 
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(1) There is no basic disagreement in relation to factors such 
as V, M, S, etc. 

(2) A general factor has been found by using Thurstone’s 
method of analysis. 

(3) This factor does not always agree with the definition of 
“g” as abstraction and noegenesis. 

(4) The characteristics of “g’” as given by Spearman and asso- 
ciates, the results of experimental work on the Primary Mental Abil- 
ities—mainly factors J, R, and D—and the findings of Spearman (28) 
and others seem to indicate that, experimentally considered, “g” 
might not be a unitary factor and that from the theoretical stand- 
point the study of the relationships between some of the Primary 
Mental Abilities and “g’”’ may prove valuable in explaining the dy- 
namics of intellectual performances. 

In relation to this last point we planned the present study. The 
study of the second order might give valuable clues for the interpre- 
tation. 

It is difficult to find a large battery of tests that will fulfill the 
tetrad criterion. The conditions required have been stated by Spear- 
man and his associates. The general factor found by using tech- 
niques other than Spearman’s should not necessarily be equated to 
“g.” The proof or disproof of “g” is more than a mathematical mat- 
ter, as indicated by Brown and Stephenson (7). Consequently it 
seems that its acceptance or rejection pertains to psychological the- 
ory. The crux of the problem is: Are the laws of noegenesis and ab- 
straction, as understood by Spearman, proved facts or do they still 
belong to the realm of theory? If these concepts are realities in an 
empirical sense they should be experimentally verifiable, whether we 
use factor analysis or any other convenient psychological technique. 

The battery of tests employed for such a study should have tests 
of “g,” I, R, and D, avoiding insofar as possible the presence of vari- 
ables that may mar the results on account of their complexity. We 
thus selected tests that were considered by previous authors as good 
measures of these factors. 


The Tests 
Tests No. 1 and No. 2: Geometrical Forms No. 1 and Geometrical 
Forms No. 2, respectively, similar to the Otis and Thurstone’s tests 
(36) of the same name. Spearman reported that these tests are sat- 
urated in “g’’ and Thurstone has shown that they are loaded in the J 
and S factors. The difference between these two tests lies in the fact 
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that in the first one there is a larger use of words than in the second 
one. 

Test No. 3: Number Series. Thurstone (35) reported the high sat- 
uration of this test in the J factor and in N, V, D, and M. 

Test No. 4: Verbal Analogies. Spearman reported the saturation of 
this test in “g’’ and Thurstone indicated high loadings in V, P, and 
to a less extent in S. 

Test No. 5: Pedigrees. This test is similar to the one used by Thur- 
stone (36) who reported its loadings in J and in M. 

Test No. 6: Inventive Synonyms. This test has been considered as a 
good measure of “g” and Thurstone (35, 36) indicated that it is 
loaded in V, W, P, and D. 

Test No. 7: Group of Figures. Taken from the “City and County of 
Newcastle-upon-Tyne Education Committee Intelligence Test.” This 
test is similar to Thurstone’s Figure Grouping test (36) reported by 
this author as loaded in P and J. Several rows of drawings are given 
and the subject has to mark in each row the design that is different 
from the rest of the designs in the same row. 


Test No. 8: Classification of Figures. Taken from the same battery 
as the previous test. This test consists of several rows of designs 
divided into three parts. The designs in the first and second parts 
are equal in some way. In the third part four designs are given, two 
of them similar to the drawings in the first part of the row and two 
similar to the drawings in the second part. The subject has to give 
a number—1 or 2—to each of the four designs according to their simi- 
larity to the designs in the first or second part of the row. Thur- 
stone (35) reported this test as loaded in V, R, and W. 


Test No. 9: Arithmetical Reasoning. Similar to Thurstone’s (36) 
and to Burt’s (8) tests of the same name. Spearman indicated that 
tests of this kind are essentially loaded with the “relation of conjunc- 
tion” and are given as “g” tests. According to Thurstone (36) it is 
loaded in R, I, V, and N. 

Test No. 10: Absurdities. Similar to Thurstone’s test of the same 
name (36) with loadings in V and I. 

Test No. 11: Group of Letters. Thurstone (36) has shown its satu- 
ration in J. 

Test No. 12: Code. This test was taken from the “City and County 


of Newcastle-upon-Tyne Educational Committee Intelligence test.” A 
number of designs are given, each one indicating a particular letter. 
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In the lower parts these same designs are repeated, although differ- 
ent parts have been taken from them. The subject has to indicate to 
which letter each of the designs corresponds. Tests of this nature 
have been regarded as correlated with “g.” 

Test No. 13: Secret Writing. Similar to Thurstone’s test of the same 
name. (36). This variable shows loadings in J and to a lesser extent 
in S and M. 

Test No. 14: High Numbers. Similar to Thurstone’s test of the same 
name (36). This author has indicated its saturation in N, S, and I. 
Test No. 15: Numerical Judgment. Taken from Thurstone’s Pri- 
mary Mental Abilities (35), where it shows loadings in R, J, and N. 
Test No. 16: Three Higher. Taken from Thurstone’s Factorial Stud- 
ies of Intelligence (36) where it is mainly loaded in N and I. 

Test No. 17: Letter Series. Thurstone has shown its saturation in 
factor I (36) and it also has been reported as a good measure of 
“g 
Test No. 18: Directions. Similar to Thurstone’s test of the same 
name (36) which is loaded in J and V. 

Test No. 19: Areas. Similar to Thurstone’s test of the same name 
(35) which is reported as saturated in J, and to a lesser extent in 
R, factor 11, M, and S. 

Test No. 20: Form Analogies or Pattern Analogies. This test was 
reported as loaded in “g” and Thurstone indicated (35) its satura- 
tion in P, J, R, and D. 

Test No. 21: Number Pattern. Taken from Thurstone’s Factorial 
Studies of Intelligence (36) where it is given as a test of I and P. 
Test No. 22: Inventive Opposites. It has been considered as loaded 
in “g.”’ Thurstone indicated its saturation in V, W, R, M, and fac- 
tor 10. 

Test No. 23: Reasoning. Taken from Burt (8). This test is similar 
to other tests of the same name, where problems of different com- 
plexity are verbally presented. 

Test No. 24: Reasoning and Inferences No. 1. This test is similar to 
the Reasoning test (Syllogisms) as given by Thurstone (35). This 
author has shown its loading in D, V, R, and I. 

Test No. 25: Reasoning and Inferences No. 2. Similar to Thurstone’s 
test of reasoning (36). According to this author is loaded in N. 


” 


Since many of our tests are not pure tests—in a factorial sense— 
other factors than J, R, and D and “g” are present, as can be corrob- 
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orated by studying the corresponding tables given by Thurstone (35, 
36). 

Since our population consisted of Spanish speaking persons, 
some of the tests were translated into Spanish, when a translation 
was not already available. Before administering the test to the popu- 
lation here studied, the whole battery was tried with a smaller group 
of 50 children, of both sexes and between 11 and 14 years of age, to 
determine the more convenient norms of administration. 


Population 

Three hundred eighty-four children between 11 and 14 years of 
age were administered the whole battery in four different sessions, 
each of a duration not longer than 45 minutes. The whole testing 
was completed for each group of approximately 40 subjects within a 
week. Half of our subjects were boys and half girls, each age and sex 
being represented by the same number of subjects. 

The selection of the population was done as follows. From the 
complete school population of a city of approximately 300,000 per- 
sons, 12 different schools were randomly selected. From each of these 
12 schools a complete list of the 11 to 14 year-old students in the three 
more advanced grades was secured, and from these lists our group 
of 384 subjects was selected at random. There was an equal number 
of subjects for each age level, for each grade, and for each sex. The 
testing was performed by two persons during the year 1943. A de- 
tailed description of the population, including the tests and the edu- 
cational, age, and sex differences, together with the study of the test 
items and other norms has already been published (23). 


Method* 

The correlations among the variables were analyzed by both 
Spearman’s and Thurstone’s methods. The results are given in Tables 
2 and 3. To stabilize the communalities three successive factoriza- 
tions were performed by using first the centroid method and after- 
wards, with the improved communalities, the multiple-group method 
twice in succession, each time using the preceding communality esti- 
mates. The final matrix of transformation and the oblique factor 
matrix are given in Tables 4 and 5, respectively. The correlations 
between the primaries, the orthogonal matrix for the second order, 

*The statistical work including the first factorization was done with the 
help of the assistants of the “Instituto de Psicologia Experimental,” University 


of Cuyo, Argentine, 1945-46. The author wishes to express his thanks to them 
for their valuable help. 








82 PSYCHOMETRIKA 


the final matrix of transformation for the second order, and the cor- 
relations between the second-order factors are given in Tables 6, 7, 


8, 9, and 10 respectively. 


The Factors 


Factor A: 
Verbal Analogies 04 
Absurdities 30 
Code 32 
Three Higher .28 
Pattern Analogies 25 
Reasoning and Inf. No. 1. 63 
Number Pattern .20 
Secret Writing .20 


Analogies, Absurdities, and Reasoning and Inferences No. 1 in- 
volve problems that are verbally presented; Code and Pattern Anal- 
ogies require the use of words to a lesser extent—the problems are 
presented graphically—while Three Higher requires the manipulation 
of numbers. This indicates that the nature of this factor goes be- 
yond the mere presentation of the problems. 

In Analogies a certain statement is given: “black is to white” 
and another similar incomplete statement such as: “red is to... blue, 
green, yellow, gray.” The activity required implies a knowledge of 
the first condition and of the restrictions imposed upon the solution 
by the partially stated second condition. In all cases the subject has 
to combine and discern from among the different answers the one 
that completes each test item in the best way. ‘The critical qualities 
of the terms to be related have to be isolated, but above crude analy- 
sis a further combination has to be performed among the conflicting 
statements. 

In Absurdities various relations are given, the understanding 
of each one of them and of the whole system being a basic condition 
for a good solution. The correctness o* the answer depends on both 
the analytical and synthetic activities and in the interaction of sev- 
eral different Gestalts. 

The explanation given for Verbal Analogies is also valid in the 
case of Pattern Analogies. The problems are essentially of the same 
nature and the differences are due to the verbal presentation in the 
former. This last aspect is accounted for by the loading of Verbal 
Analogies in Factor F. 

Reasoning and Inferences No. 1 is the most heavily loaded test. 
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To judge whether a conclusion is right or wrong it is necessary to 
analyze the premises and their interrelationships. Nevertheless the 
piecemeal study of each part is insufficient for the solution that re- 
quires a combination of the parts into a more inclusive whole. The 
same activity that was required for the solution of the previous tests 
is important here. Unless the subject can apprehend each particular 
sentence and the conflicting or non-conflicting statements in the whole 
syllogism, the correct answer is practically impossible. Again, the 
greater the flexibility of the subject in combining the parts in differ- 
ent ways, the easier the solution. 

In Code the testee has to indicate to which of the complete pat- 
terns the partially given designs belong. He has to find the proper 
solution, due account being given to the distortions imposed upon the 
drawings; that is, one possible answer is discarded in terms of a 
better one. 

In Three Higher the manipulation of numbers is required. One 
relation is given as the basis for the solution of the test; that is, any 
number three units higher than the previous one. The subject has to 
isolate from the consecutive numbers those that fulfill the stated con- 
dition; that is, some answers are discarded in favor of others, and 
so on. 

Secret Writing and Number Pattern have a small loading in 
Factor A, but the two tests will be analyzed more in detail when ex- 
amining Factor C. 

From the previous description it seems that the processes under- 
lying Factor A are not only structurally complex but also imply a 
complicated dynamic system. Not only analysis or synthesis are pres- 
ent but their interplay seems to be a fundamental characteristic in 
defining the properties of this variable. 

Factor A seems to agree with what has been usually called rea- 
soning. In our case it stresses the essentially dynamic character of 
the process, and mainly the plasticity required to perform such an 
activity in the rather complex situations here presented. In all these 
cases there are conflicting forces at play. The solutions will be easier 
for the subject the more plastic and consequently the more “Gestalt 
free” he is (24). 


Factor C: 
Secret Writing 55 
Geometrical Forms No. 2 44 
Number Patterns 38 


Geometrical Forms No. 1 36 
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These four tests are presented in different contexts—words, num- 
bers, drawings—indicating that the activity represented by Factor C 
transcends the immediate test material. 

Overlapping figures are shown in both Geometrical Forms No. 
1 and No. 2, and the subject has to indicate some of the possible rela- 
tionships among these drawings; for instance, which is the number 
inside the rectangle but outside the triangle, and so on. In Geometri- 
cal Forms No. 2 the problems are stated with a minimum use of 
words. In a general instruction for all the problems the subject is 
told to mark the space surrounded by a solid line but outside the 
dotted lines. In both tests the common characteristic task is to locate 
exactly the point in which the figures relate to each other according 
to the given instructions, the items being previously defined. 

In Number Patterns a series of numbers is given and for the so- 
lution of each item the testee must discover the relationship among 
the numbers. Here as in the two previous tests the key for the solu- 
tion seems to be centered upon discovering or noticing the relation- 
ships among the parts. The latter is implicitly given in the problems 
although it is not explicitly stated. 

In Secret Writing a similar activity is required. The testee has 
to discover in what manner certain letters are related to certain 
numbers. The problems here are more complex than in the previous 
tests. This variable is also loaded in Factor A. 

In all these tests the subject must find the relationship among 
the parts. This activity is seemingly different from the one required 
in those cases where a relationship and one of the items are given. 
For instance, it is a different task to discover the relation between 8 
and 9—8 smaller than 9—than to produce a number “greater than” 
8. In the first case both items 8 and 9 are given and the person has 
to find how they are related. In the second case, one item—8—and a 
relation—greater than—are given and the subject has to find a solu- 
tion that fulfills the condition of “greater than.” It seems that the 
tests included in this factor are better represented by the former 
kind of activity. A careful testing of the hypothesis would not be a 
difficult problem. Tests of an exceedingly simple nature should be 
employed together with others of increasing difficulty, always keeping 
the crucial functions as free from contamination as possible. 


Factor E: 
Pedigrees 87 
Directions 36 
Reasoning .20 


Analogies .20 
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Strictly speaking this factor appears in the present study as a 
doublet. Pedigrees and Directions are loaded only in Factor E, while 
Analogies and Reasoning show saturation in Factor F as well. 


In Pedigrees a genealogical tree is presented and the subject 
has to answer problems such as: “John is married to....” The sub- 
ject is given an item—John—a relation—is married to—and has to 
discover the other item that will satisfy the relationship. In the tests 
representing Factor C the nature of the problem was different. In 
that case the subject had to discover the relationship between two 
items; here the relationship is clearly stated and its thorough under- 
standing makes it possible to find the correct solution. 


In the test of Directions the psychological activity involved is of 
a more complex although similar nature. Nevertheless, by the man- 
ner in which the statements are presented and by the nature of the 
problems, it is not as evident as in the previous case. 


Our findings seem to indicate that it is possible to isolate ex- 
perimentally the second from the third principle of noegenesis. 
Nevertheless it is difficult to prove or disprove this particular hy- 
pothesis until other tests than the ones here used are systematically 
applied. 


The fact that Analogies is slightly loaded in this factor is sug- 
gestive since this test has been considered as a variety of “eduction 
of correlates.” It is also suggestive that the reasoning test shows 
some saturation in Factor E. 


Factor G: 
Group of Letters 41 
Classification of Figures Al 
Numerical Judgment 38 
Group of Figures oO 
Letter Series .26 
Number Series 25 


In Group of Letters the subject has to indicate which group dif- 
fers from the remaining ones. Some of the items—the easy ones— 
can be solved in a rather intuitive way. Probably some kind of ana- 
lytical activity is involved in the solution; but, at least in the easy 
items, the psychological performance does not seem to require a great 
deal of analysis. This is not so in the last and more difficult items of 
the test. The subject seems to reach a solution by using the activ- 
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ities that have been described by Mieli (19) as “complexité” and 
“globalization.” 


We could apply a similar reasoning to explain the loadings in 
both Letter Series and Number Series. In the easy items the ana- 
lytical activity is limited and the grouping of the letters or numbers 
is rather obvious. As soon as the problems increase in difficulty the 
grouping becomes less obvious, and the subject has to discover the 
structure characterizing each item. 


Classification of Figures requires the grouping of similar de- 
signs according to a previously given norm. The solution is obvious 
in most of the items and the structures that the subject has to dis- 
tinguish are of a simple nature. Few items require, if at all, a com- 
plex analysis. 

In Numerical Judgment, among the several possible answers to 
an arithmetic problem, the testee has to select the correct one with- 
out wasting time working in detail the exact solution. The subject, 
according to the instructions, has to realize that an operation like 
3.90 by 2.10 gives a result near to the multiplication 4.00 by 2.00. 
Among the different given answers he has to select the one nearer to 
8.00. As a matter of fact, if the testee makes a detailed and careful 
study of each problem, this procedure would probably imply a definite 
fall in the score. 

In Group of Figures, which is also loaded in Factor D, the tes- 
tee is presented with several rows of figures. In each row one of the 
figures is different from the rest and the subject has to underline it. 
The element that disturbs the whole is very obvious and the solution, 
in practically all the items but the last few, becomes evident without 
any careful analysis of the elements. 


On account of this it seems possible to suggest that this factor 
indicates the ability of bringing the parts together into a meaning- 
ful solution. Thus this variable agrees with Mieli’s “globalization.” 
When the problems become more difficult, a greater amount of analy- 
sis may be necessary. Nevertheless in our case and by studying the 
items that have been solved by our subjects, it is quite likely to as- 
sume that no complex activity was involved to any great extent. 

It is interesting to observe that our findings agree with Mieli’s 
opinion in the sense that “globalization” and “complexité” may go 
together, “complexité” implying the activity involved in the solution 
of more complex structures. 
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Factor F: 
Arithmetic Reasoning 45 
Opposites 09 
Synonyms 33 
Absurdities 30 
Syllogisms 34 
Geometrical Forms No. 1 .46 
Reasoning 24 
Analogies 2A. 


All the tests defining Factor F require the use of words. Anal- 
ogies is also loaded in Factors A and E. Absurdities has saturation 
in factor A as well. Synonyms and Opposites are also loaded in Fac- 
tor D, while Reasoning is saturated in E. 

It is interesting to observe that, while Geometrical Forms No. 1 
and No. 2 cluster together in Factor C, only the first one of these tests 
is loaded in F. The main difference between these two tests seems to 
be in the greater use of verbal expressions in Geometrical Forms 
No. 1. 

Arithmetic Reasoning has been previously described by Thur- 
stone as loaded in the Verbal factor. It is interesting to observe that 
the two tests involving arithmetical computations do not cluster to- 
gether in any of our factors. It seems in view of this and of previous 
evidence that arithmetic ability may depend on functions different 
from the mere manipulation of numbers. 


It is curious to find that neither Reasoning and Inferences No. 
1 or Directions have loadings in Factor F. The interpretation of this 
fact as well as the interpretation of factorial findings in general 
should be done in terms of the population of tests and of individuals. 
A factor may be absent in a factorial study, but this does not indi- 
cate that it is not influential in the solution of the test. It is obvious 
that the basic hypothesis and the variables employed to test a fact 
are of fundamental importance in the final results and interpreta- 
tions. 


Factor D: 
Synonyms 68 
Group of Figures 59 
Opposites 82 
Letter Series at 


This factor transcends the immediate test context. Synonyms 
and Opposites are loaded in the verbal factor F, while Group of Fig- 
ures and Letter Series show saturation in Factor G. 
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In Synonyms and Opposites the subject has to find the answer 
which stands in a particular relation to the given words. This rela- 
tionship is that of likeness and its opposite. 

The same reasoning can be applied in Group of Figures, where 
the subjects select the design that is different from the remaining 
ones. In Letter Series the subject forms a group similar to the given 
ones. 

The relation of likeness and its opposite has been considered by 
Spearman as furnishing “the main resource of all mental tests, 
whether sensory or otherwise.” Line (18) has stressed the impor- 
tance of this relation; and in a previous study (24) we discussed the 
part that this relation plays in intellectual performances. As indi- 
cated there, the relation of likeness makes possible the extension of 
conceptual thinking to levels of high complexity. Applied to funda- 
mentals of a concrete or abstract nature it allows for an endless suc- 
cession of further developments. 

The conclusion that two or more things are equal or different 
may be reached either in a purely perceptual level—red different from 
green—or in levels of high complexity, as when the equality of two 
scientific concepts is discovered. Both the “as” and “that” types of 
cognition are permeated by this special and somewhat unique rela- 
tionship. 


Factor B: 
High Numbers 52 
Areas ol 
Number Patterns .26 


It is difficult to define this factor. In High Numbers the subject 
has to indicate which are those numbers higher than those placed at 
both sides. In Areas the testee has to add several white squares 
drawn on a larger black square, and state which is the total number 
of white squares in each item. Number pattern has already been de- 
scribed. 

All these tests require the use of numbers but it is impossible to 
individualize this factor with certainty. A perceptual element may 
be present in the solution of ‘these tests, mainly in High Numbers 
and Areas, or perhaps the existence of a space component could also 
be defended. In any of these cases however we should expect a dif- 
ferent distribution of the loadings of the other tests. 


The Second Order 
The factorization of the correlations between the primaries in- 
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dicates that these intercorrelations can be explained in terms of three 
linearly independent factors (Table 7). The loadings of the tests in 
the second-order orthogonal factors are given in Table 11. 

By plotting the loadings of the tests in the first second-order fac- 
tor against “g” loadings we find a very close agreement as indicated 
in Figure 1. The differences found may be attributed, in many of 
these values, to rounding errors. 

“g” loadings were obtained by using “g’’ reference tests and by 
means of Thurstone’s formula (38). The differences between these 
two procedures were insignificant and the latter values are employed 
here. 

By plotting on the sphere the orthogonal system obtained by fac- 
torizing the correlations between the primaries, the rotated second 
order—Table 9—was obtained. The matrix of transformation and the 
correlations between the second-order factors are indicated in Tables 
8 and 10. 

Factor a is orthogonal to the rest of the second order factors. 
Factor D is, of all the primary factors, the one more heavily loaded 
in a. Factors A, C, and E also show a small saturation. It is inter- 
esting to discover that factor D (likeness) is the one characterizing 
this second-order factor, and it is also understandable how the rela- 
tion of likeness may permeate the activities implied in A, C, and E. 

Factor # is positively correlated with y. Factors G, C, and E are 
loaded in this factor. C was described as the capacity of finding rela- 
tions, while E was tentatively interpreted as eduction of correlates. 
These two activities seldom occur in isolation, at least in ordinary test 
material; but on the contrary the solution of the test problems re- 
quires the interplay of both activities. Factor a would refer to a par- 
ticular kind of relation, that of likeness, while factor £8 would refer 
to the mechanisms involved in dealing with relations and eduction of 
correlates. 

In Factor y cluster Factors A, B, F, and G. F was interpreted 
as a verbal component, B was not interpreted, A was characterized 
as an indicator of the plasticity implied in reasoning processes, and 
G apparently indicates the ability of bringing the different parts to- 
gether into a meaningful solution. 

Although factors A and G may be considered of a similar kind, 
they are nevertheless sufficiently different. One of them indicates 
plasticity —that is trying different kinds of combinations when some 
restriction is involved—while the other seems to refer mainly to the 
process of synthesizing the parts into a solution. 
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On account of this description it appears that Factor y is related 
to the synthetic process, including here all the possible reshaping and 
redistribution of the elements and the use of the “instruments” by 
means of which this process is accomplished. For instance it is sug- 
gestive to find that the verbal factor has only loadings in y as sug- 
gesting that the activity defined by the latter develops in a concrete 
way and by using whatever symbols the subject may have at his dis- 
posal. 


Discussion of Results 

It is known that factorial results should be interpreted in terms 
of the special test configuration. In our particular case the tests 
were selected with a certain purpose and thus it is not strange that 
we do not find factors like P, S, and so on in our final results. In a 
way we eliminated them deliberately from our battery. It should 
also be noted that this process of elimination can not be perfectly 
accomplished, pure tests being still an ideal aim in factorial studies. 

The presence of a verbal factor is easy to understand and was 
expected in view of our variables. Induction —J— has been defined 
by Thurstone (35) as the ability to “find a rule or principle for each 
item of the test” and R—reasoning—as “successful completion of a 
task that involves some form of restriction in the solution” while D 
is of “deductive kind.” Moreover the same author says that “the tests 
which Spearman has designed as best measures of his general factor 
seem to be inductive in character.” 

The fact that there is a general orthogonal second order factor 
where the tests show the same loadings as in “g” is an interesting 
finding. Its identification with “‘g’” is only possible when the tests in- 
volve noegenesis and abstraction, which is the case in the present 
study. 

This general factor does not explain all our values, since in the 
second order three factors are necessary to explain the correspond- 
ing intercorrelations, and since some of the residuals of the original 
correlational matrix after partialling out “g” are still appreciably 
high. Moreover, it is necessary to remember that the correspond- 
ence of the “yg” loadings is for the unrotated second order, as should 
be expected in terms of Spearman’s method. 

Taking into account the fact that the tests employed are loaded 
in Thurstone’s J, R, and D, we have some grounds to believe that 
these factors are related to “g.” Studying the characteristics of the 
tests employed and the definitions of the factors mentioned above, this 
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finding is not surprising. Thurstone (35), indicates that most of 
Spearman’s tests of “g’”’ seem to be inductive in character—induction 
being understood in terms of his J factor—while Spearman (28) has 
shown how the functions involved in tests of this nature can be un- 
derstood in terms of “‘g.” 

If these results are accepted, the second problem is related to the 
possibility of considering ‘“g” as non-unitary, in the sense in which 
Reyburn and Taylor (21) referred to it. These authors believe that 
“g’ can be split into several independent factors. In our case we 
have found seven factors in the first order, one of them a verbal com- 
ponent, and another one, Factor B, not precisely identified, neither 
of them understandable in terms of “g.” 

Some of the factors isolated in this study seem to correspond 
quite closely to variables identified by others. Factor A seems to 
agree with Mieli’s “plasticité” (19) and with a similar variable iso- 
lated by Thurstone (37) as Factor E, and by ourselves (24). Bech- 
toldt (3) described a Factor Y similar to the one we are here dis- 
cussing and Yela (40) found a factor called Z, that can be interpre- 
ted in the same terms. The fact that Yela’s factor has a high corre- 
lation with his R factor—related to the analytical procedure followed 
in the solution of a test—deserves consideration. 

Factor A appears in the second order loaded in factor o—in- 
terpreted as likeness and its opposite—, and to a greater extent in y, 
which seems to indicate essentially a synthetic process. 

It seems that Factor A should be interpreted as the capacity of 
bringing together several conflicting Gestalts, “the more plastic the 
subject, the more likely he is to be Gestalt free” (24), and it is prob- 
ably related to some temperamental and personality concepts. The 
influence of this factor in reasoning may be of a fundamental kind. 

Factor G was interpreted as similar to Mieli’s “complexité” and 
“globalization” (19) and to the factor defined as “perception of re- 
lations in space necessary for the construction of a whole” (24). In 
the present case this definition should be extended to contexts other 
than the spatial one. Thustone’s Factor A (37) is similar to the one 
we are here describing. 

The solution of an intellectual task requires, in particular in- 
stances, a greater amount of analysis, which explains why Factor G 
is split into Factors £ and y of the second order. Nevertheless the most 
important activity here involved seems to be that of synthesis. 

Factor D has been described as perceiving the relation of like- 
ness. We have dealt with this problem in a different study (24). It 
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is interesting to discover that this particular kind of relation ap- 
pears as defining one of the second order factors, mainly a. If this is 
corroborated in further studies, it may give interesting clues as to 
the nature of intelligence and of the processes of reasoning at differ- 
ent levels of complexity. Probably the dynamic interplay of this fac- 
tor with others could be followed and clarified by a systematic study 
of pathological cases. The basic research concerning this relation of 
likeness still requires a great deal of preparatory work. 

Factors C and E appear, together with Factor G, in Factor #. 
The first one seems to indicate the ability of finding relations, that is 
what Spearman considers the second noegenetic principle. This ex- 
plains in part how it is saturated in Factor a. 

Factor E was tentatively interpreted as the ability to educe cor- 
relates and is also loaded in Factor £, with some small saturations in 
y and a. While the variables in Factor y seem to deal with more con- 
crete psychological performances, those in 6 seem to refer mainly to 
the logical process underlying intellectual activity. 

Factor # is positively correlated with y. This may explain the 
loading of G in both factors. 

The findings that Factors A, F, and G cluster in Factor y is in- 
teresting since it seems to indicate that this represents mainly a syn- 
thetic activity. In this respect y seems to represent the centuries-old 
concept of synthetic sense as enunciated first by Aristotle, developed 
in the scholastic psychology, and reintroduced in a somewhat modi- 
fied form by the Gestalt school and by Moore and Moyniham (21) in 
later times. These last authors conclude that many tests involve the 
synthetic functions mainly when there is a perception of relations. 

The fact that the verbal factor appears in y may indicate the 
function that our means of expression play in intellectual activities. 

If the second order refers to a more fundamental level of psy- 
chological dynamics, our results seem to indicate that one rela- 
tion—that of likeness—is a basic step in the solution of intellectual 
problems; that analysis and synthesis are the two main procedures 
at our disposal, one indicating the more abstract logical performance 
that is mainly involved in cognition of the “that” type, the other 
pointing to the combination of the different elements and their ex- 
pression by means of appropriate symbols. 

It is possible that individuals differ in the amount of these last 
two abilities and that their use depends, to a great extent upon train- 
ing and other conditions. In this sense it is possible to conceive the 
feasibility of developing certain abilities. The process would essen- 
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tially involve making more readily available the use of certain sym- 
bols. Cognition in this case evolves towards the so-called ‘‘as” type. 

It seems also possible, if our findings are correct, to assume that 
for practical and theoretical purposes it would be more convenient 
to think of concepts such as “g’” as non-unitary in character. The 
central intellective factor should be considered in terms of more 
fundamental and dynamically interacting activities. These latter 
seem to agree with the results of previous studies of an experimental 
and non-experimental nature. 

Some of the factors usually described as belonging to intellec- 
tual activity could well be only the results of test construction or par- 
ticular training, and it is our opinion that in the description of psy- 
chological parameters a careful distinction should be made between 
those results which are due to the test context and those which tran- 
scend the immediate context of the tests. 

Intellectual activity is a dynamic performance, and it would be 
of interest to know the effects of education, training, and matura- 
tion,* among others, upon the factors here described. The fact that 
these findings seem to agree with observations coming from other 
fields and that some of the factors here described seem to agree with 
those found by other authors and ourselves in different studies, is 
quite suggestive. Moreover, the identification of factors similar to 
those isolated in other countries with different cultural backgrounds 
and traditions is interesting in terms of the influence of environmen- 
tal factors in factorial invariance, and in general in psychological 
activities. 


*It has been suggested that maturation could explain part of our findings. 
This is a good hypothesis to test, but in this study no investigation of such a 
possibility was made for several reasons: among others, the fact that the age 
range here included is not large enough for a valid test of such a possibility 
and the fact that the study was not planned for such a purpose. 
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FIGURE 1 
Plot of “g” Loadings against Loadings in the First Second-Order Factor. 
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ABLE 1 


Product Moment Correlations between the Tests* 


8 4 5 6 7 8 9 10 11 12 18 14 15 16 17 18 19 20 21 22 28 24 25 


45 

31 50 

35 56 12 
31 22 26 
82 34 27 
55 57 387 
31 48 39 
35 45 24 
40 48 32 
87 28 27 
26 32 19 
40 40 29 
46 62 32 
51 56 39 
34 40 69 
16 21 18 
36 45 30 
33 36 22 
24 55 16 
49 52 42 
21 29 03 


44 29 34 

47 13 26 42 

81 26 42 41 34 

86 22 28 42 42 35 
10 25 21 35 25 30 
15 12 25 30 14 21 
08 20 47 43 39 40 
43 19 25 53 38 35 
45 32 33 58 39 45 
40 14 25 45 31 27 
09 02 22 16 15 15 
38 21 24 40 30 37 
80 20 23 45 26 22 
57 27 35 59 44 25 
42 24 29 52 44 41 
24 07 08 20 26 17 


31 08 18 34 29 21 19 18 30 29 18 


*The decimal point has been omitted for all 


Test 


= 


COND oF CD 


24 

30 18 

37 31 26 

42 25 37 46 

46 38 33 39 41 
83 382 22 35 37 
16 04 31 13 17 
43 26 25 32 35 
86 65 25 24 33 
42 21 30 38 45 
39 28 25 40 40 
20 14 06 25 17 


entries. 


TABLE 2 
“g” Loadings 
“g” Loading Test “g” Loading 

54 10 .60 
.28 11 57 
62 12 61 
78 13 49 
54 14 42 
.60 15 58 
38 16 64 
48 17 74 
.80 18 58 


40 

19 14 

42 28 14 

39 21 18 26 


49 39 18 29 28 
44 44 23 41 31 50 


23 17 05 22 17 18 18 
19 18 11 22 23 27 19 18 12 22 25 32 15 


Test “g’” Loading 


27 
56 
51 
63 
69 
.28 
39 


19 
20 
21 
22 
23 
24 
25 
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Tests I 

1 35 

2 21 

3 58 

4 68 

5 37 

6 41 

7 28 

8 47 

° 9 61 
10 56 
11 56 
12 60 
13 42 
14 33 
15 60 
16 58 
17 62 
18 48 
19 19 
20 53 
21 40 
22 51 
23 59 
24 53 
25 29 


II 
19 
09 
12 
14 
20 
G4 
04 
30 
15 
00 
17 
10 
01 
51 
20 
16 
18 
10 
58 
07 
20 
24 
14 
—27 
13 


TABLE 3 

The Centroid Factorial 
III IV V 
50 18 10 
46 —ll —08 
22 14 —01 
08 28 24 
25 06 68 
05 79 08 
13 51 —20 
03 03 —09 
82 35 16 
12 16 26 
13 —04 —08 
04 16 09 
56 —08 00 
05 02 —08 
04 —03 00 
00 24 13 
24 28 —03 
21 13 28 
—09 —04 14 
12 14 00 
33 09 —03 
01 46 03 
15 23 21 
—16 —09 00 
16 09 31 
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*The decimal point has been omitted for all entries. 


I 

II 
III 
IV 
V 
VI 
VII 


TABLE 4 

The Final 
A B Cc D 
62 —01 —01 02 
—-53 82 —07 00 
—20 —17 91 —11 
-—23 —12 —13 88 
03 —06 —23 —28 
28 45 21 30 
38 28 21 —20 
omitted for all entries. 


*The decimal point has been 


Matrix* 
VI Vil 
—26 11 
07 —08 
06 —11 
09 09 
21 —50 
13 16 
16 —23 
—22 —13 
—15 14 
—13 10 
—13 —21 
11 03 
24 —08 
18 09 
—16 —16 
06 10 
08 —04 
11 —18 
09 09 
09 —02 
23 22 
—13 3 
—10 —06 
13 18 
—14 11 


E 
01 
01 
01 
02 
85 
24 
—A7T 


Transformation Matrix* 


F 
02 
01 
27 
34 
36 
—62 
54 


31 
17 
—1l1 
10 
—53 
—35 


—67 





at | 
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TABLE 5 
The Oblique Factor Matrix* 


Tests A B C D E F G 


1 —05 —05 386 —02 —02 46 07 
2 00 02 44 —08 —01 —02 09 
3 19 04 16 15 07 02 25 
4 34 11 00 19 20 21 03 
5 —05 02 —01 00: 87 —05 01 
6 14 02 —02 68 05 33 01 
7 —04 —03 08 59 —01 —08 34 
8 01 10 —05 02 —06 06 41 


9 17 —02 19 re 05 45 09 
10 30 —09 02 01 16 33 02 
DE 11 00 05 06 01 00 41 
12 32 10 01 15 10 06 11 
13 20 00 55 —04 10 —06 03 
14 00 52 08 08 —06 —06 ib! 
15 16 04 —05 —04 05 03 38 
16 28 14 —05 18 09 16 07 
17 18 09 18 27 03 09 26 
18 14 01 09 09 36 05 09 
19 —07 51 —11 —05 11 01 —01 
20 25 05 10 15 04 03 16 
21 20 26 38 08 —06 10 —08 
22 13 14 —06 382 —10 39 12 
23 16 —02 01 12 20 24 18 
24 63 —08 —06 —05 —05 —-05 —03 
25 07 01 05 —09 19 34 —09 


*The decimal point has been omitted for all entries. 


TABLE 6 
Correlations between the Primary Vectors 


A B Cc D E F G 


1.00 33 16 12 29 33 43 
33 =§=61.00 16 01 27 38 40 
.16 16 1.00 .10 35 mi 32 
12 01 10 = =1.00 07 08 —.14 
29 27 35 07 1.00 .28 41 
38 38 «El .08 28 1.00 AT 
43 40 32 —.14 1 47 ~=1.00 


Qs FTQAW PS 
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TABLE 7 
Loadings of the Primaries in the 
Centroids of the Second Order 


I’ mw | 6a 

A 57 09 —.13 
B 58 19 —.05 
Cc 438 —.36 23 
D 12 —27 —.35 
E 59 —.21 14 
F 58 24 —.13 
G 73 .28 31 

TABLE 8 

a B Y 


38 40 66 
II’ —60 —.54 71 
III’ —.70 74 —.26 


TABLE 9 
Rotated Factorial Matrix for the 
Primaries in the Second Order 


a B Y 
A 25 .08 AT 
B 12 07 50 
C 22 54 —.03 
D 45 —.07 —.02 
E 25 45 .20 
F Ad 01 59 
G —.11 37 .60 
TABLE 10 


Correlations between the 
Second-Order Factors 


a B Y 
a 1.00 .04 .00 
B 04 =1.00 30 
7 00 30 00 
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TABLE 11 
Loadings of the Tests in the Orthogonal Second Order Factors 
Test I’ i’ III’ Test Y II’ II!’ 
1 48 01 .06 14 389 10 .00 
2 28 —.12 18 15 58 21 14 
3 55 .02 .06 16 51 09 —.11 
4 62 05 —.12 £7 62 02 00 
5 65 —19 .16 18 51 —07 05 
6 44 —07 —.32 19 .28 15 —.02 
7 23 —10 —.04 20 46 02 .00 
8 45 24 13 21 42 —.08 —.04 
9 64 05 —.07 22 50 17 —.19 
10 49 .08 —.06 23 59 .08 .00 
11 52 14 15 24 22 08 —.10 
12 51 .06 —.06 25 34 04 —.03 
13 45 —.21 16 
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ON THE STANDARD LENGTH OF A TEST* 


Max A. WOODBURY 


INSTITUTE FOR ADVANCED STUDY 
AND 
UNIVERSITY OF MICHIGAN 


(1) A new descriptive parameter for tests, the standard length, 
is defined and related to reliability, correlation, and validity by 
means of simplified versions of known formulas. (2) The standard 
error of measurement is found to be related in simple fashion to 
the amount of information in a test in the sense of R. A. Fisher. 
The amount of information is computable as the test length divided 
by the standard length of the test. (3) The invariant properties of 
the standard length of a test under changes in length are discussed 
and proved. Similar results for the correlation coefficient corrected 
for attenuation and the index of validity are indicated. 


Introduction 


In connection with another study the notion of the standard 
length of a test turned out to be a useful means of simplifying nota- 
tion and clarifying proofs. This brief note is presented to introduce 
this new and possibly valuable notion. The standard length is re- 
lated to the information of R. A. Fisher} through the variance of the 
errors of measurement. There is an indirect relation to the type of 
information considered by Shannont and later by Wiener. 


It has long been recognized that the reliability of a test can be 
used (under certain restrictions which do not concern us here) to 
obtain the reliability of the test after it has been lengthened. Similar 
relations hold for the correlation between tests or the correlation of 
a test with a criterion (validity coefficient). This leads to the notion 
of the reliability and the validity as mathematical functions of the 
length of the test and the correlation between two tests as functions 
of the test lengths. 


*The research covered by this note was supported by the Office of Naval 
Research. 


Fisher, R. A. Statistical Methods for Research Workers, 10th Edition. 
London: Oliver and Boyd, 1946, p. 346. 


{Shannon, C. E. A mathematical theory of communication. Bell System 
Technical Journal, 1948, 27, 379-428; 623-656. 
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Standard Length 


The functional dependence of reliability upon test length is of 
a rather special algebraic character and involves only one parameter. 
It will be to the advantage of all if this parameter is chosen to sim- 
plify the formula. In the usual form of the relation the parameter 
is the reliability at a given (observed) length and gives the reliabil- 
ity for a test of e; times the original length. The weil known expres- 
sion 

Ci7 ii 


1+ (e; —, 1) ri 





(1) 


expresses this relation. 


If we rather arbitrarily define (however see the comment fol- 
lowing (4)) the standard length of the test 7 as 
t(1— ri) 
4 =———, (2) 
Vii 
where ¢; is the observed test length and r;; the observed reliability, 
then we find that the standard length computed for a test after it 
has been altered in length is the same as when computed for the 
original length. Specifically we find, since the new length is e;t; and 
the new reliability is given in (1), that the new standard length is 











Ci ii 
(est) Toe 
1+ (e:—1)7ii t;(1 — ris) 
Gili Vii 
1+ (e—1) rs 


which is the same as before. It should be noted in passing that any 
other invariant of the test must be a function of the standard length, 
where by an invariant of the test we mean any parameter of the test 
which does not depend on the length. It is clear that any invariant 
describes the contents of the test, not the accidental feature of its 
length. The standard length of a test together with its length deter- 
mines the reliability. The formula for this purpose is 


ti 
hoes ’ 
t; + Ti 


where ¢; is the length of the test. From this it is easy to see that 
when a test has a length equal to its standard length it has a reli- 





(3) 
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ability of one-half, when it has a length of twice its standard length 
it has a reliability of two-thirds, etc. As a matter of convenience 
we note that in order to obtain a reliability of ri; the length of the 
test must be given by the relation 

ti Tii 

-= . (4) 

Ti I~ Pig 





Other definitions of 7; in (2) would lead to less simple formulas for 
(3) and (4) so that this may be considered as justification for the 
particular choice for 7; . 


Fisher has used a concept of information which gives the vari- 
ance of errors as the reciprocal of the amount of information. This 
concept can be related to the reliability through the easily derived 
formula 

1 


1+ wey 


where o; is the standard error of measurement and the standard 
deviation of the true scores is taken as a unit. Combining this equa- 
tion with (3) we see that the amount of information is 





(5) 


Ue ce 


j=—=—_=-, (6) 


i.e., the length of the test measured in terms of its standard length 
as a unit. Thus a unit of information is the amount of information 
in a test of standard length. 


Correlation and Validity 


The formula analogous to (1) for computing the correlation 
between the tests i and j after each has been lengthened is* 


,| Ci e: 
. ( L+ @,— 1) ra )( 1+ (e,—1)7j; me 


where e; and e; are the ratios of the lengths of the lengthened tests 
to the original tests, and 7;;, 7;; and ri; are the original reliabilities 
and correlation. By noting the relationship of (7) to (1) and (3) 
one can write down immediately the equation for the correlation as 








*See Peters, C. C., and Van Voorhis, W. R. Statistical Procedures and their 
Mathematical Bases. N. Y.: McGraw-Hill Book Co., 1940, Eq. 111, p. 193. 
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a function of the lengths of the tests, viz. 


tj 1/ t; 
Vij = Tiw,jo ( \ - (8) 
t, + 1; tj + 7; 


Vij 

Tiiig Ame (9) 

VTi Vii 
is the correlation coefficient corrected for attenuation and where the 
other symbols are defined as in (3). It should be noted that the cor- 
relation coefficient corrected for attenuation is the same for the 
lengthened tests as for the original tests so that it, like the stand- 
ard lengths, is invariant under changes of length and describes a 
property of the content. of the tests only and not of their lengths. 
To prove this, substitute in (9) from (7) and (1) to obtain the new 
coefficient corrected for attenuation. Further, any other invariant 
of the two tests must be a function of the three already described, 
viz., the standard lengths and the correlation coefficient corrected 

for attenuation. 

The case of correlation with a criterion (validity) scarcely needs 
separate treatment. Let c denote the criterion and 7;, the validity 

coefficient of test 7 at length t; and we have 


eee 7 
ram tine | (10) 
ti) + ti 


where 7'.,- is the index of validity, computable from the formula 








where 








Tic 


VT 
The index of validity, like the standard length and the correlation 
corrected for attenuation, is invariant under changes in the length 
of the test 7. From (10) we can find the length of the test which 
will give a specified validity: Note that only validities smaller in 
absolute value than the index of validity can be obtained and that 
the sign of the validity is unchanged by lengthening the test. Let ri. 
be the desired validity, let t; be the length of the test which will give 
this validity and we have 


(11) 


Tiw,c — 


Ti Tic 
‘.=-—_—__———. (12) 
el — ae 
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THE ESTIMATION OF THE PARAMETERS OF A NEGATIVE 
BINOMIAL DISTRIBUTION WITH SPECIAL REF- 
ERENCE TO PSYCHOLOGICAL DATA* 


HERBERT S. SICHEL 


NATIONAL INSTITUTE FOR PERSONNEL RESEARCH, 
SOUTH AFRICAN COUNCIL FOR SCIENTIFIC AND INDUSTRIAL RESEARCH 


As an analytical tool the negative binomial distribution may 
have wide applications in the psychological field. The estimation of 
its parameters is demonstrated to be often inefficient when fitting 
by the method of moments. This causes possibly true hypotheses 
to be rejected. Formulas for the efficiency of the moment method 
and solution of the likelihood equations are derived. Efficiency 


graphs and detailed tables for the A(r, p) function reduce the 
maximum-likelihood method to a minimum of computational labour. 
Practical applications of the ease and power of the M.L. procedure 
are given. 


Introduction 


To mathematical statisticians the principles of maximum likeli- 
hood have always been a topic ever since R. A. Fisher (1) wrote his 
classic paper in 1921. On the applied side, however, the method has 
only come into vogue within the past decade. Of late, an ever-in- 
creasing stream of scientific papers appearing in a wide variety of 
journals testifies to the importance of “efficient estimation” in the 
practical field. 


In the majority of psychometric studies we deal with normal, 
nearly normal, or normalized variables. This explains why the need 
for “efficient estimation” has not arisen in this branch of applied 
science to the extent that it has in others. It is a well-known fact, of 
course, that the customary moment statistics are efficient in the case 
of a normal population. 


*The author wishes to express his gratitude to the South African Council 
for Scientific and Industrial Research for permission to publish this paper; to 
Mr. A. G. Arbous for supplying the data on absenteeism; to Mr. R. V. Sutton 
for the data on Two-hand Co-ordination errors; and last, but not least, to the 
staff of the Statistical Section who computed various tables under the expert 
guidance of Mr. J. S. Maritz. 
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On the other hand there exists one class of psychometric prob- 
lems which can be measured only on a discrete scale. If the distribu- 
tion of the discrete variable is very skew (J-shaped) it is virtually 
impossible to normalize the original data. At the best the transfor- 
mation of a naturally discrete variable into a continuous one remains 
a dubious procedure. It appears to be more justified both from the 
theoretical and practical point of view, to work with the observed dis- 
crete variables. 


A discrete probability distribution which is very flexible is the 
negative binomial law, sometimes referred to as the Greenwood-Yule 
curve (2). Its rationale was primarily built around the unequal lia- 
bility of individuals to accidents and it may be derived as follows: 

Suppose an individual has r accidents in a given time unit. If 
his exposure period is composed of many equal time units he will 
incur r accidents in y(7) units where r =0,1,2,3----&. Provided 
the frequency distribution of his accidents may satisfactorily be de- 
scribed by the Poisson law we then may define the probability of 
his having 7 accidents per time unit as 


e~* ar 





prob.(7) = 
r! 


A, the average number of accidents per time unit, may be considered 
as a measure of the individual’s liability to accidents. In a group of 
operators, of whom all are exposed to the same environmental haz- 
ard, the parameter 4 will vary from person to person according to 
the differential proneness to accidents. If the probability distribu- 
tion of liabilities among the group is given as 





dF = et we (0<1< w) 


I'(p) 


we shall arrive at the number of persons having r accidents in a 
single time unit by integration of 


P: n 
A(r) =N e-c4 fP-1 eA — dj 
s 3 r! 





9 








ae Pe >-1 F(r+p) 1 
oo (aaa r'(p) P(r +1) (e+ 1)" 
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The frequencies of 0,1,2,3,-:-:- accidents in the group are given, 
therefore, by 


N ¢ ‘ ‘ p p(p+1) p(p+1)(p+ 2) 
e+1 "e+1'20e+1)2?) Bi(e+1)> | 
which is the customary expression for the Greenwood-Yule distribu- 
tion. 








The negative binomial law may have wide and important appli- 
cations in the psychological field, as it often is capable of describing 
satisfactorily such diverse phenomena as accidents, number of ab- 
sences, number of days lost, errors made in a test, etc. 


The two parameters of the negative binomial law are estimated, 
almost universally, by the method of moments. Even such an ad- 
vanced mathematical statistics text as Kendall’s (3) only refers to 
this procedure. Research workers on industrial accidents, without 
exception, use the same method. The only papers which draw atten- 
tion to the inefficiency of the moment estimation in certain cases of 
the negative binomial law are Fisher’s (4) and Haldane’s (5) in 
Annals of Eugenics, a journal not ordinarily read by psychologists. 
The first part of the present investigation was undertaken by the 
author without any prior knowledge of Fisher’s and Haldane’s work. 
It, therefore, slightly differs from their approach but leads to iden- 
tical results. 


It will be shown that the practical application of the negative 
binomial law to phenomena in which the psychologist is mainly in- 
terested results very often in a rejection of a hypothesis which may 
really be true. Frequently, this is solely due to the employment of an 
inefficient estimation procedure. 


The likelihood equation leads to mathematical expressions awk- 
ward to handle in the computation process. For this reason efficiency 
graphs and tables involving di- and trigamma functions were calcu- 
lated and are included in this paper to facilitate the estimation of 
the unknown parameters. All the research worker now needs is an 
ordinary table of logarithms. The writer claims that, with the help 
of the graphs and tables, the computational labour has been cut down 
to a minimum. The advantage of using an efficient method of esti- 
mation is great, as illustrated by the examples quoted at the end of 
this paper. 
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The first four population moments and the large sample moment 
estimators of the negative binomial distribution 


For reasons which will become apparent at a later stage, it is 
convenient to write the negative binomial law as 





p >» 1 F(rtp) m ‘ 
f(r) =(—P- ) —_-""?'/ (1) 
ptm I(p) U(r +1)\ ptm 
where r= 0,1, 2--:- o, and m and p are the parameters in the 


law. 


The characteristic function of (1) is 


+ Py ‘ m - 
o(t) => e** f(r) -| a 0" 2) 9 | , 


and its cumulant generating function 


~ % 
v(t) =log $(t) =—p log | 1— (e**—1) — | ; (2) 
p 
Expansion of (2) gives 


[ ib (at)? (it)® — (it) m 
as + + seen ff — 
a ee i& 


r it = (it)? (at) 2/7 m \2 
+4) —+ + oe Gs, 
1! 2! 3! p 


L 


it (it)? 13/m\3 at * sm,‘ 
+4] ar ae & +4 ol (~) 4s 


By expanding the square brackets in (3) and collecting terms of 
(it)? (it)? 

- we find the jth cumulant being the factor of a 
j: q! 


have 





y(t) =p 














(3) 





We then 
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Finally, from the well known relations between cumulants and mo- 
ments, 














a =m, (5a) 

m? 
f2=—m+—, (5b) 

Dp 

3m? 2m 
Ms =m + + , and (5c) 
p 
Im +6m 12m?+ 3m* 6m* 
fam + 8m? + + Pam, (5d) 
Dp p* p° 


One of the advantages of choosing the law in the form of equation 
(1) is brought out readily by equation (5a): The parameter m is 
identical with the population mean yy’. 


From (5a) and (5b) we find the moment estimators ™ and 9 
of the parameters m and p, for large sample size n , 


m=m,', and (6a) 

a m,” 

eS, (6b) 
M2—™M, 


where m,' and mz the estimates of the parent moments wu,’ and 2. 


The variances of moment estimators m and p for large samples 
The variance of the mean is 





Var (m,') = z » 
n 
and, by substitution of (5b) into the above, 
m(p +m) 
Var (1m) = Var (m,’) =————__.. (7) 
pn 
p is a function of the moments, say 
Dp — ) (m,' 3 Mz) ’ 
hence, for large n , approximately 
Op \? Op dh 
Ver () =([ —— Var (m,') +2 — Cov (m,’', m2) 
om, om,’ OMs P 


(8) 


ae) 
+ {| —— ) Var (m.). 
Ms 


Substitution of the appropriate partial derivatives of (6b) and of 
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standard expressions for Var(m,'), Var(m.) and Cov(m,’, m2) into 
equation (8) gives (after replacing the statistics by their expecta- 
tions) 

ba? 


Var pp eee 
(Ue — y')* 


, , ’ , (9) 
X [(2u2— pr’)? fe + por’? (a — fe?) — 2p’ 3 (Qe — ps’) ] 
Substitution of (5a, b, c, d) into (9) yields 
2p(1 + 1+ p/m)? 
Var(p) = p(1+p) (1+ p/ (10) 





n 
this being the large sample variance of p. 


The efficiency of moment estimators m and p for large samples 


If we want to gauge the efficiency of the moment estimators we 
must evaluate the Hessian determinant 


0? log f (r) 0° log f(r) 
a | pee wun E  cccnmstinemenee 
( 0m? ( Omop ) 


A= ” " | , (11) 
-#( og f(r) ) -#( 0 log f(r) 
omop 0 p* 


whence the variances of the maximum-likelihood estimators m and D 
(for large n) are 








re 1 7 ] 
Var(m) =———E cae o » and (12) 
An 0 p? 
1 0? log f ( \ 
Var(p) =—-——E wa ) ‘ (13) 
An om? 


Now 
Plog f(r) | r—m 


aomop ~~ (pt+m)?’ 





and its expected mean value 


o? log f(r) o rT—m 
s(—— )=3— 1 =o. 
Oomop r-0 (Pp + m)? 
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Hence it follows that m and p are uncorrelated. This is the second 
advantage of writing the negative binomial in the form of equation 
(1). Equations (12) and (13) simplify into 








1: 
V =— y 12 
ar (m) a Ia (12a) 
E {| ————_ }” 
eae 
c= 1 
Var (p) =— ‘ (13a) 


0? lo 
B( gf(r) )r 
0p’ 


d? log I'(x +1) 
Trig.(<) = —————"—, 
d x? 


Writing 


where Trig. (x) is the trigamma function, we find after some differen- 
tiation and summation 
. m(p +m) 








Var (m) =—————__, (14) 
np 
A : ‘(rt 
weir} l(a) wsdlsts) FOR 
ptm U'(p) =zo\ ptm r+ 1) 
(15) 
m 
x { Trig. (c) (p —1) — Trig. (x) (p + r-y|-—_| ‘ 
p(p +m) 


Equations (7) and (14) are identical. Hence it follows that the mo- 
ment estimator m has maximum efficiency. It may also be shown 
that m is a sufficient estimator. The efficiency of p is 


__Var.(p) _ ea. (15) 


Eff. (9) = = . 16 
) Var.(p) ~— eq. (10) os 





Equations (15) and (16) may be simplified considerably, as indi- 
cated by Fisher (4). 








114 PSYCHOMETRIKA 


By writing 





=1—e, 
ptm 
m 
oS SS By 
p+m 


and using the relation 


Trig. (x) (p—1) — Trig.(x) (p +r—1) =3 (p+ »—1) 2, 


v=1 


where Trig.(x) is the trigamma function, we find, after some algebra, 





A ict | m r |= 
var) =| »3—( air.) | (15a) 
IS D+ 9 


and 








( m A I(r) (p +2) 
p+m '(r +p) 
The reciprocal of the efficiency, expanded as a series, gives 


ZX 2! m 2X 3! m \? 
[Eff.(p)]-*=1+ + 
3(p+2)\ p+m 4(p+2)(p+3)\pt+m 


2x4! m “4 
+ re (17) 
5(p +2) (p+ 8) (p+ 4) gra) 


which is the same as Fisher’s expression (4). 


For p small and finite m (17) tends towards the divergent series 
2g +t Ete); 


I (16a) 


m2 *T 


lh | 
Eff.(p) = | 22—- 











hence 
lim. Eff.(p) =0. 


p-0 
For p > o and all values of m, 
lim. Eff.(p) =1. 


For m > 0 and » finite, rm 
lim. Eff.(p) = 1. 


; m—0 


For m large, equations (15) and (10) become 
s Pp 
Var.(p) = : 
n[p Trig.(x) (p—1) —1] 
2p(1+p) 


n 





Var.(p) ~ 
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hence 
lim. Eff.(p) = {2(p + 1) [p Trig.(z) (p—1) —1]}*, = (18) 


mo 


where Trig.(x) is the trigamma function. 


Equation (18) gives the lower bound of Eff.(j) below which the 
efficiency of estimator » cannot sink. For p = 5.5 the minimum 
efficiency of moment estimator p is .80. For values of p > 5.5 it is 
not necessary, therefore, to estimate p by the more arduous maxi- 
mum-likelihood method. 


Exact values of Eff.(j) were calculated for various levels of 
» and m. They are graphed in Figure 1. After a preliminary esti- 
mation of the parameters by the method of moments, Figure 1 will 
indicate whether it is advisable to estimate by maximum likelihood. 
Although we enter Figure 1 with the (often) inefficient moment esti- 
mate, it has been found in practice that the decision arrived at from 
the graph is almost always right. This may be verified by subse- 
quently reading off the efficiency based on the maximum-likelihood 
estimates and comparing with the efficiency value found from the 
moment solution. It is desirable to estimate by the method of maxi- 
mum likelihood whenever Eff.(~ < .80. 


Large sample maximum-likelihood estimators m and p 


The likelihood solution is given by 

















dlogL p p ma 
om p+m m(p +m) ¥, 
dlog L ; 

=n log ) -nDiew (p—1) (19) 

Op ptm 

m 1 
+ n— >Srt+S Dig.(x) (p +r—1) =0, 

p+m ptm 


where we write 
d log I'(x + 1) 


dx 





Dig. (x) = 


where Dig.(z) is the digamma function. By solving (19) for m and 


p we find 
A 1 n 
m=— Ti, (20) 


N iz1 
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1” * * 
=z [Dig.(x) (p + 7; —1) — Dig. (x) (p —1)] 


A 


m 
~toe (142 )=o (21) 
Pp 


where Dig.(x) is the digamma function. 


To facilitate the numerical solution of equation (21) for p the func- 
tion 
A(r, p) = Dig. (x) (p + r—1) — Dig. (x) (p —1) 


has been tabulated for various values of p and for r=0,1, 2,--- 35. 


The table of 4(7, p) is included in this paper (Table 1). Should 
values of r > 35 be required we may make use of a simple approxi- 
mation for large 7, i.e. 

A(r,p) = log(p + r—1) 


1 1 bd 
+ _ — Dig.(z) (p—1). 


2(p+r—1) 12(p+r—1)? 





The function Dig. (x) (p—1) is given in Table 2. 


The variances of the maximum-likelihood estimators m and p 
were previously given as equations (14) and (15a). 


Applications 
(a) Absence Proneness: 

In an intensive study of absenteeism among industrial workers, 
the result of which is to be published in due course, it was established 
that liability to absence from work differs from person to person. 
Thus we may speak of absence proneness just as we also speak of 
accident proneness. In order to test for absence proneness we may 
collect data on the number of absences of individuals in a given time 
unit, arrange them into a frequency distribution, and fit the negative 
binomial. If the resulting y?-test is satisfactory we may take this as 
an indication that among the human population from which the sam- 
ple was drawn there exists unequal liability to absenteeism. (This 
procedure of testing the hypothesis is a necessary one but it is not 
sufficient, as shown by Maritz (6.) 

Now Fisher (7) has shown that the use of the y’-test is only 
legitimate if the method of estimation is “efficient.” In the case of 
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TABLE 1 
The Function A(r, p)* 








8 

i sie 
S 
be 


0.2 


0.3 0.4 


0.5 


0.6 0.7 


0.8 


0.9 1.0 





0 
10.00000 
10.90909 
11.38528 
11.70786 
11.95176 
12.14784 
12.31177 
12.45262 
12.57607 
10 12.68596 
12.78497 
12 12.87506 
13 12.95771 
13.03404 
13.10497 
16 13.17119 
13.23330 
13.29178 
19 13.34703 
13.39939 
13.44914 
13.49653 
23 13.54178 
13.58506 
13.62657 
13.66641 
13.70472 
28 13.74162 
13.77721 
30 13.81157 
13.84479 
82 138.87695 
13.90810 
13.93831 
13.96764 


CWONQAO TENE SO 


0 
5.00000 
5.83333 
6.28788 
6.60038 
6.83847 
7.03078 
7.19207 
7.33096 
7.45291 
7.56161 
7.65965 
7.74893 
7.83090 
7.90666 
7.97708 
8.04287 
8.10460 
8.16274 
8.21768 
8.26976 
8.31927 
8.36644 
8.41148 
8.45459 
8.49591 
8.53559 
8.57376 
8.61053 
8.64599 
8.68022 
8.71335 
8.74540 
8.77645 
8.80657 
8.83582 


0 0 
3.33333 2.50000 
4.10256 3.21428 
4.53734 3.63095 
4.84037 3.92506 
5.07298 4.15234 
5.26161 4.338752 
5.42034 4.49377 
5.55733 4.62891 
5.67781 4.74796 
5.78534 4.85434 
5.88242 4.95049 
5.97092 5.03821 
6.05222 5.11886 
6.12741 5.19348 
6.19734 5.26293 
6.26270 5.32786 
6.32405 5.38884 
6.38185 5.44631 
6.48649 5.50066 
6.48831 5.55220 
6.53757 5.60122 
6.58452. 5.64795 
6.62936 5.69259 
6.67228 5.73533 
6.71344 5.77631 
6.75296 5.81568 
6.79098 5.85356 
6.82761 5.89006 
6.86295 5.92527 
6.89708 5.95928 
6.93008 5.99218 
6.96203 6.02403 
6.99299 6.05489 
7.02302 6.08483 
7.05217 6.11390 


0 
2.00000 
2.66667 
3.06667 
3.35238 
3.57460 
3.75642 
3.91027 
4.04360 
4.16125 
4.26651 
4.36175 
4.44871 
4.52871 
4.60278 
4.67175 
4.73626 
4.79687 
4.85401 
4.90806 
4.95935 
5.00812 
5.05464 
5.09909 
5.14164 
5.18245 
5.22167 
5.25940 
5.29576 
5.33086 
5.36475 
5.39754 
5.42928 
5.46006 
5.48991 
5.51889 


0 0 
1.66667 1.42857 
2.29167 2.01680 
2.67628 2.38717 
2.95406 2.65744 


0 
1.25000 
1.80556 
2.16270 
2.42586 


0 0 
1.11111 1.00000 
1.63743 1.50000 
1.98226 1.83334 
2.23867 2.08334 


3.17145 2.87021 2.63419 2.44275 2.28334 


3.35002 3.04565 
3.50154 3.19490 
3.63312 3.32477 
8.74940 3.48971 


2.80660 
2.95366 
3.08187 
3.19551 


2.61224 2.45000 
2.75717 2.59286 
2.88375 2.71786 
2.99611 2.82897 


3.85356 3.54281 3.29755 3.09712 2.92897 


3.94790 3.63626 
4.03411 3.72174 
4.11347 3.80048 
4.18700 3.87347 
4.25550 3.94150 
4.31960 4.00519 
4.37984 4.06507 
4.43666 4.12157 
4,49042 4.17504 
4.54144 4.22580 
4.58998 4.27411 
4.63628 4.32019 
4.68053 4.36425 
4.72290 4.40644 
4.76355 4.44692 
4.80262 4.48584 
4.84021 4.52329 
4.87644 4.55939 
4.91141 4.59424 
4.94519 4.62791 
4.97787 4.66048 
5.00951 4.69202 
5.04019 4.72261 
5.06995 4.75228 
5.09886 4.78110 


3.39014 
3.47488 
3.55301 
3.62547 
3.69304 
3.75633 
8.81585 
3.87203 
3.92523 
3.97573 
4.02381 
4.06968 
4.11354 
4.15555 
4.19587 
4.23464 
4.27195 
4.30792 
4.34265 


4.37620 4.16942 


4.40867 
4,44012 
4.47061 
4.50019 
4.52893 


3.18886 3.01988 
3.27290 3.10322 
3.35042 3.18014 
3.42236 3.25157 
8.48947 3.31823 
3.55237 3.38073 
8.61154 3.43956 
8.66740 3.49511 
8.72031 3.54774 
3.77056 3.59774 
3.81840 3.64536 
3.86408 3.69082 
3.90774 3.73429 
3.94958 3.77596 
8.98975 3.81597 
4.02835 3.85442 
4.06553 3.89146 
4.10137 3.92718 
4.13597 3.96166 
3.99499 
4.02725 
4.05850 
4.08880 
4.11821 
4.14679 


4.20178 
4.23313 
4.26352 
4.29302 
4.32168 





*Due to rounding, tabular values may be in error by one in the last unit. 
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TABLE 1 (Continued) 
The Function A(r, p)* 
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Np tA 1:2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 
rN 

0 0 0 0 0 0 0 0 0 0 0 

1 .90909 .83833 .769238 .71428 .66667 .62500 .58823 .55556 .52632 .50000 
2 1.88528 1.28788 1.20401 1.18095 1.06667 1.00961 .95860 .91270 .87115 .83334 
8 1.70786 1.60088 1.50704 1.42506 1.35238 1.28739 1.22887 1.17586 1.12756 1.08334 
4 1.95176 1.88847 1.73960 1.65234 1.57460 1.50478 1.44164 1.38419 1.383164 1.28334 
5 2.14784 2.08078 1.92828 1.83752 1.75642 1.68335 1.61708 1.55660 1.50118 1.45000 
6 2.81177 2.19207 2.08701 1.99377 1.91027 1.83487 1.76633 1.70366 1.64606 1.59286 
7 2.45262 2.83096 2.22400 2.12891 2.04860 1.96645 1.89620 1.83187 1.77264 1.71786 
8 2.57607 2.45291 2.34448 2.24796 2.16125 2.08273 2.01114 1.94551 1.88500 1.82897 
9 2.68596 2.56161 2.45201 2.35484 2.26651 2.18689 2.11424 2.04755 1.98601 1.92897 
10 2.78497 2.65965 2.54909 2.45049 2.36175 2.28123 2.20769 2.14014 2.07775 2.01988 
11 2.87506 2.74893 2.63759 2.538821 2.44871 2.86744 2.29317 2.22488 2.16179 2.10322 
12 2.95771 2.83090 2.71889 2.61886 2.52871 2.44680 2.37191 2.30301 2.23931 2.18014 
18 3.03404 2.90666 2.79408 2.69348 2.60278 2.52033 2.44490 2.87547 2.381125 2.25157 
14 3.10497 2.97708 2.86401 2.76293 2.67175 2.58883 2.51293 2.44304 2.387836 2.31823 
15 3.17119 3.04287 2.92987 2.82786 2.73626 2.65293 2.57662 2.50633 2.44126 2.38073 
16 3.23330 3.10460 2.99072 2.88884 2.79687 2.71817 2.63650 2.56586 2.50043 2.43956 
17 3.29178 3.16274 3.04852 2.94631 2.85401 2.76999 2.69300 2.62203 2.55629 2.49511 
18 3.34703 3.21768 3.10316 3.00066 2.90806 2.82875 2.74647 2.67523 2.60920 2.54774 
19 3.89989 3.26976 3.15498 3.05220 2.95985 2.87477 2.79723 2.72573 2.65945 2.59774 
20 3.44914 3.81927 3.20424 3.10122 3.00812 2.92331 2.84554 2.77381 2.70729 2.64536 
21 =: 3.49653 3.386644 3.25119 3.14795 3.05464 2.96961 2.89162 2.81968 2.75297 2.69082 
22 3.54178 3.41148 3.29603 3.19259 3.09909 3.01886 2.93568 2.86354 2.79668 2.73429 
23 «=. 3.58506 3.45459 3.83895 3.23533 3.14164 3.05623 2.97787 2.90555 2.83847 2.77596 
24 =. 8.62657 3.49591 3.88011 3.27631 3.18245 3.09688 3.01835 2.94587 2.87864 2.81597 
25 3.66641 3.53559 3.41963 3.81568 3.22167 3.18595 3.05727 2.98464 2.91724 2.85442 
26 ©=3.70472 3.57376 3.45765 3.35856 3.25940 3.17854 3.09472 3.02195 2.95442 2.89146 
27 ~=—- 8.74162 3.61053 3.49428 3.389006 3.29576 3.20977 3.18082 3.05792 2.99026 2.92718 
28 =. 3.77721 3.64599 3.52962 3.42527 3.383086 3.24474 3.16567 3.09265 3.02486 2.96166 
29 3.81157 3.68022 3.56375 3.45928 3.86475 3.27852 3.19934 3.12620 3.05831 2.99499 
80 3.84479 8.718385 3.59675 3.49218 3.389754 3.81120 3.23191 3.15867 3.09067 3.02725 
31 3.87695 8.74540 3.62870 3.52403 3.42928 3.34284 3.26345 3.19012 3.12202 3.05850 
82 3.90810 8.77645 3.65966 3.55489 3.46006 3.87352 3.29404 3.22061 3.15241 3.08880 
83 8.98831 8.80657 3.68969 3.58483 3.48991 3.40828 3.32371 3.25019 3.18191 3.11821 
34 3.96764 3.83582 3.71884 3.61890 3.51889 3.43219 3.85253 3.27893 3.21057 3.14679 
35 8.99618 8.86422 3.74717 3.64215 3.54706 3.46027 3.38054 3.30686 3.23842 3.17456 











*Due to rounding, tabular values may be in error by one in the last unit. 
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TABLE 1 (Continued) 
The Function A(7, p)* 












































‘\p -2.1 y Se 4 2.3 2.4 2.5 2.6 Aes f 2.8 2.9 3.0 
TiN 

0 0 0 0 0 0 0 0 0 0 0 

1 47619 .45455 .48478 .41667 .49000 .88461 .87037 .85714 .34483 .33334 
2 .79877 .76705 .73781 .71078 .68571 .66239 .64064 .62030 .60124 .58334 
3 1.04267 1.00514 .97037 .93806 .90793 .87978 .85341 .82863 .80532 .78334 
4 1.23875 1.19745 1.15905 1.12324 1.08975 1.05885 1.02885 1.00104 .97481 .95000 
5 1.40268 1.35874 1.81778 1.27949 1.24360 1.20987 1.17810 1.14810 1.11974 1.09286 
6 1.54853 1.49763 1.45477 1.41463 1.37693 1.84145 1.80797 1.27631 1.24632 1.21786 
- 5 1.66698 1.61958 1.57525 1.53368 1.49458 1.45773 1.42291 1.88995 1.35868 1.32897 
8 1.77687 1.72828 1.68278 1.64006 1.59984 1.56189 1.52601 1.49199 1.45969 1.42897 
9 1.87588 1.82632 1.77986 1.73621 1.69508 1.65623 1.61946 1.58458 1.55143 1.51988 
10 1.96597 1.91560 1.86836 1.82393 1.78204 1.74244 1.70494 1.66932 1.63547 1.60322 
11 2.04862 1.99757 1.94966 1.90458 1.86204 1.82180 1.78368 1.74745 1.71299 1.68014 
12 2.12495 2.07333 2.02485 1.97920 1.93611 1.89533 1.85667 1.81991 1.78493 1.75157 
18 2.19588 2.14375 2.09478 2.04865 2.00508 1.96383 1.92470 1.88748 1.85204 1.81823 
14 2.26210 2.20954 2.16014 2.11858 2.06959 2.02793 1.98839 1.95077 1.91494 1.88073 
15 2.32421 2.27127 2.22149 2.17456 2.13020 2.08817 2.04827 2.01030 1.97411 1.98956 
16 2.38269 2.32941 2.27929 2.28203 2.18734 2.14499 2.10477 2.06647 2.02997 1.99511 
17 2.43794 2.38435 2.33393 2.28638 2.24189 2.19875 2.15824 2.11967 2.08288 2.04774 
18 2.49030 2.43643 2.88575 2.83792 2.29268 2.24977 2.20900 2.17017 2.13313 2.09774 
19 2.54005 2.48594 2.43501 2.38694 2.34145 2.29831 2.257341 2.21825 2.18097 2.14536 
20 2.58744 2.53311 2.48196 2.43367 2.38797 2.34461 2.80389 2.26412 2.22665 2.19082 
21 2.63269 2.57815 2.52680 2.47831 2.48242 2.88886 2.34745 2.80798 2.27031 2.23429 
22 2.67597 2.62126 2.56972 2.52105 2.47497 2.48123 2.388964 2.34999 2.31215 2.27596 
23 =: 2.71748 2.66258 2.61088 2.56203 2.51578 2.47188 2.43012 2.39031 2.35232 2.31597 
24 2.75732 2.70226 2.65040 2.60140 2.55500 2.51095 2.46904 2.42908 2.89092 2.35442 
25 2.79563 2.74043 2.68842 2.63928 2.59273 2.54854 2.50649 2.46639 2.42810 2.39146 
26 2.83253 2.77720 2.72505 2.67578 2.62909 2.58477 2.54259 2.50236 2.46394 2.42718 
27 2.86812 2.81266 2.76089 2.71099 2.66419 2.61974 2.57744 2.538709 2.49854 2.46166 
28 2.90248 2.84689 2.79452 2.74500 2.69808 2.65352 2.61111 2.57064 2.53199 2.49499 
29 2.93570 2.88002 2.82752 2.77790 2.73087 2.68620 2.64868 2.60311 2.564385 2.52725 
80 2.96786 2.91207 2.85947 2.80975 2.76261 2.71784 2.67522 2.63456 2.59570 2.55850 
31 2.99901 2.94312 2.89043 2.84061 2.79839 2.74852 2.70581 2.66505 2.62609 2.58880 
82 8.02922 2.97324 2.92046 2.87055 2.82324 2.77828 2.73548 2.69463 2.65559 2.61821 
3 3.05855 3.00249 2.94961 2.89962 2.85222 2.80719 2.76430 2.72387 2.68425 2.64679 
34 3.08704 3.03089 2.97794 2.92787 2.88029 2.83527 2.792381 2.75180 2.71210 2.67456 
85 3.11474 3.05852 3.00549 2.95534 2.90779 2.86260 2.81956 2.77847 2.78920 2.70159 









*Due to rounding, tabular values may be in error by one in the last unit. 








HERBERT S. SICHEL 











TABLE 2 
The Function Dig.(x) (p — 1)* 
P Dig. (x) (p—1) p Dig.(x) (p—1) 
zy —10.42375 1.6 .12605 
Z — 5.28904 La -20855 
3 — 3.50252 1.8 .28499 
4 — 2.56138 1.9 .85618 
5 — 1.96351 2.0 42278 
6 — 1.54062 22 .48534 
<4 — 1.22002 2.2 .54429 
8 — .96501 2.3 -60004 
9 — .75493 2.4 .65290 
1.0 — .57722 2.5 -70316 
b A — .42375 2.6 -75105 
12 — .28904 2.7 -79678 
1.3 — .16919 2.8 84055 
1.4 — .061388 2.9 .88250 
1.5 + .08649 3.0 .92278 





*Due to rounding, tabular values may be in er- 
ror by one in the last unit. 


an inefficient estimation process it may happen that, on the strength 
of the y?-test, we reject a hypothesis which may really be true. 


The observed distribution of number of absences of 318 workers 
in a particular division of a large steel corporation is given in the 
first two columns of Table 3. The observational period was six months 
and the absences were all classified as “absent without permission.” 
For the sample mean and variance we find 


m, = -66981, 
M2 = 1:52934 , 


) and hence from (6a) and (6b) the moment estimates 


| im = -66981 , 
| p = -52197. 


Substituting these estimates into the negative binomial law (eq. 1) 
: and calculating expected frequencies, gives finally a total y? = 7.188 
(columns 3-5 in Table 8). For such a value of x? and two degrees of 
freedom, P = .08. We should therefore reject the hypothesis of 
absence proneness, as described by the Greenwood-Yule model (2). 


However, as mentioned previously, the rejection of a true hy- 
pothesis may be entirely due to what Fisher calls “errors of estima- 

















122 PSYCHOMETRIKA 


TABLE 3 


Comparision of the Method of Moments and the Method of Maximum Likelihood 
in Fitting the Negative Binomial in Studying Absence Proneness 























Method of Estimation Moments Maximum Likelih 
Number of Observed Expected 
Absences Frequency,f, Frequency, /, fo—fr x2 T's fo—f'zg Xx 
0 217 206.7 +10.3 .513 214.9 +2.1 0 
1 44 60.6 —16.6 4,547 53.4 —94 16 
2 29 25.9 + 3. oll 23.4 +56 18 
3 11 12.3 — 15 137 11.8 —0.8 
4 11 6.1 6.3 
5 2 3.1 + 4.5 1.620 3.5 
6 4 1.6 2.0 +85 4 
7 & Over 0 1.7 2.1 
Totals 318 318.0 0.0 7.188 318.0 00 3 
a. P=28 DF. —Z Ps 





tion” if we fit with an inefficient method. By entering Figure 1 with 
the values found for ™ and » we find from the graph 


Eff.(p) = .68. 


It follows that we employed an inefficient method of estimation 
and the rejection of the above hypothesis may be groundless. We, 
therefore, proceed to estimate the parameters of the negative bi- 
nomial law with the maximum-likelihood method. For computational 
purposes it is convenient to write equation (21) as 

- Nafees i m 
—DA(ri, p) — 230258 log.. | 1+— ]=0. (21a) 
N izi p 


m is identical with m = .66981 and n = 318. We must solve equation 
(21a) for p. We find three trial values for p in the neighborhood of 
p = .52197, say p= .3,.4, and .5. 


The writer finds it convenient to write columns 1 and 2 of Table 
3 on a strip of paper, seeing to it that the spacing of the figures is 
the same as the spacing in Table 1. By placing the strip next to the 


columns headed p = .8, .4, and .5 of Table 1 is easy to find after 
continuous multiplication of the observed frequencies with the cor- 
responding tabular values and division by 318, 





=~ mn a. =~ 
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1 
— SA(r,-3) =1.25782, 
318 
1 
— SA(r,-4) = .98108, and 
318 


1 
— DSA(r,-5) = .81169; 
318 


and finally by substitution of appropriate values into (21a). 








Pp y= f(m,p) 5 - 
= © -— | =4 
Py =e Yy_, = +.08450 08719 
casas — __.9026 +.05158 
ore Yo yore —.03561 
Dig = 2d Y.. —= —.03330 





By inverse interpolation we may now find the root of equation (21a), 
i.e. 
p=p,tuh, (22) 


where h is the unit tabulated (in this case h = 0.1) and 





Y-1— Ys Ysr—Yun\? 2Yo 

6 2 J oe Lehane, (23) 
2 6? 2 6? 6? 

From the above values we have 








p = - 40000 — - 04303 X - 1 =- 39570. 
The expected frequencies for these M.L. estimates of the law (eq. 1) 
are given in the 6th column of Table 38. 

Out of five cells used for computing total x’, four show an im- 
provement in comparison to the moment fit. Total y? has been re- 
duced to one half of its former value and, most important, P = .18 
so that we may confidently accept the hypothesis of absence prone- 
ness. 


Entering Figure 1 with the efficient estimates m and p we find 

from the graph 

Eff.(p) =- 63, 
which is not vitally different from the result based on the moment 
estimates. This indicates that the decision whether to proceed to a 
maximum-likelihood fit, i.e. 

Eff.(p) 2-80, 
is not invalidated by using the moment estimates. 
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(b) Accident Proneness: 


It is a well-known fact that certain accident distributions may 
successfully be represented by a negative binomial law. It may be 
shown (Chambers and Yule, (8))that parameter p is independent 
of the time exposure. 


Table 4 gives the distribution of minor accidents of two groups 
of workers of the same division as mentioned in the example on ab- 
sence proneness. The first group is composed of workers who had 
already had their full annual leave, whereas the second is made up 
of all those who had taken no leave. The observational period was 
six months. As the workers were all exposed to the same environ- 
mental risk, and as there is little likelihood that the leave group is, 
on the average, more prone to accidents than the non-leave group, 
we should expect to find similar estimates for parameter p from the 
two samples. 


TABLE 4* 


Comparison of the Method of Moments and the Method of Maximum Likelihood 
in Fitting the Negative Binomial in Studying Accident Proneness 









































Method of 
Estimation Moments Maximum Likelihood 
Number of a 
minor With annual leave Without annualleave With A.L. Without A.L. 
accidents ho fe fe fe I's t's 
0 if 81.7 73 68.7 17.9 71.6 
1 36 33.6 28 34.5 35.8 32.8 
A 24 18.6 22 19.4 20.0 18.3 
3 13 14.2 8 11.3 11.9 10.8 
4 4 7.0 9 6.7 72 6.6 
5 3 4.4- 5 4.0 4.5 ) 4.1 
6 2 2.9 2 2.4 2.8 2.5 
cf 1 1.9 2 1.5 1.8 1.6 
8 2 224 0 9 a1 1.0 
9 A 8 1 6 KS 6 
10 0 5 1 3 5 A 
15 1 1.2 - 7 8 | 
n 165 165.0 ’ 151 151.0 165.0 151.0 
m 1.84545 1.383775 1.34545 1.383775 
p 59345 — .80479 .69870 .69659 
x? 3.864 3.631 2.453 3.077 
r .28 .30 49 .o9 





*The brackets in table indicate grouping for x*-test. 
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The moment estimates are 
with annual leave : ~— .593, 
without annual leave : = .805. 


The two estimates of p do not seem to agree. However, entering 
Figure 1 with the moment estimates, we find 


with annual leave : Eff.(9) =.61, 
without annual leave : Eff.(p) =.68. 


It is clear that the moment method is inefficient. The corresponding 
maximum-likelihood estimates are 


with annual leave : p= .699, 


without annual leave : p= .697, 


showing excellent agreement. The corresponding y?-tests are given 
in Table 4. In both cases the distributions fitted by the maximum- 
likelihood method represent the observations much better than do 
the moment graduations. 


(c) Two-Hand Co-ordination Test: 


The Two-Hand Co-ordination Test is used primarily as a meas- 
ure of speed of performance. A study of the errors per se incurred 
on this test revealed differences in liability to making errors. For 
individuals this liability is remarkably constant from trial to trial. 
The split-half reliability of the errors compares favourably with the 
best of current performance tests. In view of the above the distribu- 
tion of test errors might be expected to follow a negative binomial 
law. Table 5 gives the distribution of total errors for ten trials on the 
Two-Hand Co-ordination Test for 504 subjects. 


The moment fit (P = .10), although acceptable, is not too satis- 
factory. From Figure 1 we see that for 
m=7.28 and p=—.63 
the efficiency of the estimation process is very low, i.e. 
Eff.(p) ~ 0.4. 


We proceed to estimate the parameters by the maximum-likeli- 
hoed method. The resulting 7?-test gives P = .53. Once again the 
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“errors of estimation” have clouded the conclusions based on. the 7*- 
test. As the maximum-likelihocd fit is entirely satisfactory, there 
exists good reason to use the negative binomial law as a mathemati- 
cal model of Two-Hand Co-ordination errors. 


TABLE 5* 


Comparison of the Method of Moments and the Method of Maximum Likelihood 
in Fitting the Negative Binomial to Errors on the Two-Hand Co-ordination Test 





























Number Number 
of fo fe I's of fo fe f's 
Frrors (Obs.) (Mom.) (Max. Lik.) Errors (Obs.) (Mom.) (Max. Lik.) 
-_ . we 102.1 82.1 23 1 3.3 3.3 
1 58 59.3 57.5 24 3 3.0 3.0 
2 51 44.5 46.1 25 a rf 2.6 } 
3 49 35.9 38.6 26 2 2.5 2.4 
4 42 80.0 32.9 27 2 2.3 A | 
5 23 25.6 28.4 28 3 2.0 1.9 
6 30 22.1 24.7 29 3 1.9 1.7 
jf 20 19.3 21.6 30 3 1.7 1.6 
8 17 16.9 19.0 31 1 1.5 1.4 
9 18 14.9 16.7 82 0 1.4 12 
10 16 13.2 14.8 383 A 
11 11 11.8 13.1 388 2 7.5 5:2 
12 vi 10.5 11.6 40 3 
3 10 9.4 10.3 42 2 
14 7 8.4 9.2 44 | 
15 6 7.5 8.2 47 1 
16 5 6.8 7.3] 49 1 (fees 5.6 
17 7 6.1 6.5) 51 1 
18 2 5.5 we 68 1 
19 4 5.0 5.2 73 ‘4 
20 5 4.5 4.6 n 504 504.0 504.0 
m 7.28373 7.28373 
21 3 4.0 4.1 Pp -63126 -77510 
x* 28.495 18.888 
22 5 3.7 3.7 r 10 53 





*The brackets in table indicate grouping for x?-test. 
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A SUCCESSIVE APPROXIMATION METHOD Of 
MAXIMIZING TEST VALIDITY 
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The ratio of item validity to item-total correlation can be used 
to select items which will tend to yield the maximum correlation 
with a criterion. Items to be retained are identified by comparing 
the ratio for each item with the validity of the original test. Fur- 
ther improvement of the validity in the experimental sample ean be 
obtained by adding items to or removing items from the selected 
nucleus, according to recomputed ratios involving the correlations 
of the items with the nucleus and evaluated by means of a revised 
cut-off point. With slight variations, the method may be used for 
interest and personality tests as well as for aptitude material. The 
principal advanta:se over previous methods is that for any cycle of 
the analysis an exact cut-off point is provided. 


The problem of choosing from among a number of items that 
group yielding the maximum correlation with a criterion has long 
troubled testmakers. Since out of n items, one may use any possible 
combination of 2,3, 4,--- items, there would be 2” — (n + 1) va- 
lidity coefficients to compare in order to find the optimal solution em- 
pirically. Such a task is obviously prohibitive. The rational ap- 
proach would be to find the multiple correlation between the items 
and the criterion, under the restriction that all weights be either 1 
or 0. No feasible routine for this procedure has been proposed. 

Various attempts have been made to find a practical solution. 
Horst (4) developed the “Method of Successive Residuals,” which 
can be used with either differential or unit weights and which builds 
up the test by starting with the item of highest validity and then 
determining which of the n — 1 items will correlate highest with the 
unpredicted part of the criterion, a process which is continued until 
the “optimum” test is selected. The Toops L-Method (8) is another 
“build-up” technique for selecting either items of a test or tests for 
a battery using unit gross-score weights. Wherry and Gaylord (9) 
modified this procedure so that it can be used with differential 
weights as well. In all of these methods, the amount of labor is con- 
siderable. 

Other methods have been devised in which some type of index 
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is used to determine the relative contribution of each item to the 
total test validity. On this basis a predetermined number of items 
are eliminated. Horst (5) published such a method which involves 
the computation of the mean criterion score and the mean total test 
score of all subjects who answered correctly on any particular item. 
TjcFjFc 





The index he obtains can be shown to be equivalent to where 


Lj tO jo; 
the subscript 7 refers to the item, c to the criterion and ¢ to the tc- 
tal test, and the correlations are point-biserials. A scatter diagram 
is plotted with the numerator as ordinate and the denominator as 
abscissa. All items are retained for which 7;, is positive while 7;, is 
negative, plus all items for which both correlations are positive and 
which lie above an arbitrary radius drawn from the origin with slope 
such that the predetermined number of items are excluded. This is 
equivalent to retaining the items with the largest index. The pro- 
cedure may be repeated using the selected items as a test nucleus if 
further refinement is desired. Horst claims that this method is less 
time-consuming than his previous technique and at the same time 
yields higher validities for the selected group of items. Gulliksen 
(3) recently published a similar graphic item selection procedure in 
which 7;.0; is plotted against 7;;0; and those items retained which 
lie above a radius drawn from the origin. In this method no consid- 
eration is given to items having a negative correlation with the total 
score. It is also assumed that Srj:o; is nearly equal to S7;.0; where 


5] 
rj, is the correlation between the item and the score on the selected 
subset of items. However the elimination of a group of items may 
have a considerable effect on the item-total correlation of a retained 
item. 

Richardson and Adkins (6) have proposed an index for deter- 
mining the contribution of the items based on the relative weight 
with which the item would enter into the prediction of the criterion 
when added to a test exclusive of that item. The index, 


Vje— T7tV te 





5 
(te — NjcP jt) 0; 


is the ratio of the beta weight of the item to the beta weight of the 
test divided by the standard deviation of the item. The items hav- 
ing the lowest indices are dropped. In this method again no adjust- 
ment is suggested for the change in weight resulting from successive 
elimination of items. 
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Flanagan (2) has suggested a short method of item selection 
in which a nucleus of the most valid items is first selected, as judged 
by any convenient type of correlation with the criterion. Items are 
added to or subtracted from this nucleus by comparing the item- 
nucleus correlation of each item with the item-criterion correlation. 
Those items which have a higher correlation with the criterion score 
than with the nucleus are retained while those having a lower corre- 
lation with the criterion than with the nucleus are dropped. This pro- 
cess may be repeated until no further improvement in validity re- 
sults. Flanagan remarks that the first approximation secures a very 
large proportion of the possible improvement. Thorndike (7) has 
suggested an adaptation of Flanagan’s technique utilizing the partial 
correlation of each item with the criterion, holding constant the vari- 
ability associated with the nucleus. 


In determining the best technique to be used to maximize the 
validity of a test we should note that in every method the item in- 
dices are subject to sampling fluctuations and all error variance is 
weighted in favor of the test-maker. Thus there is no reason to sus- 
pect that any one of the methods, including the method developed 
here, is superior from the standpoint of greater stability in the re- 
sulting validity coefficient for subsequent samples. The choice of 
method, therefore, seems to depend mainly on the work involved for 
the obtained increase in validity. 


A method of item selection is developed and presented here 
which, while it has points in common with practically all of the short 
methods mentioned above, has certain unique features and results in 
a reasonably exact solution of the problem of selecting items yield- 
ing maximum validity. This method of “successive approximations” 
takes into account the changes in item-total correlations which result 
after the first selection is made and as a consequence adds and sub- 
tracts additional items to the original selection, each additional cycle 
resulting in a closer approximation to the “perfect” solution. An 
exact cut-off point is provided for the number of items to be in- 
cluded at each cycle. 

Consider an experimental test of n items which has been admin- 
istered to N individuals. The items may be scored in any systematic 
manner, such as most preferable (1), neutral (0), and least pref- 
erable (—1), or simply as right or wrong. Product moment correla- 
tions are obtained between each item of the test and the criterion, 
between each item and the total score, and between the total score 
and the criterion. 








we have 


but 
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Using the following notation, 





=the deviation of an individual’s criterion score from the 


mean, 


from the mean of that item, and 


24; — deviation of an individual’s score on a particular item +7 


= deviation of an individual’s total score from the mean 


score, 


N 
DCi (Lia + Lig + +++ + Lin) 


4=1 





Tet —— 
Nos: 


NrcO10¢ + TecG20— Hees + jcajae TF ++ + TacOnTe 





OSs 
n 
D159; 
j= 





ot 


N 
D (Lia + Lig + +> Lin) by 


t=1 








G4 = N nations ° 
Dividing (2a) by o;, we find 
ot — Srjro; s 
4 
Substituting (2b) in (1) yields 
Lr ico; 
T ct = e 
=r ito; 


(1) 


(2a) 


(2b) 


(3) 


From equation (3) it can be seen that the validity of a test is 


increased if the validity of any item is increased or if the correla- 
tion of any item with the total score is decreased. Accordingly, if one 
considers a test, all of whose items have the same difficulty and va- 
lidity and the same item-total correlation, the validity of the test 
as a whole would equal the validity of any item divided by its corre- 
lation with the total test, and the larger this ratio, the greater the 
validity. It would appear, therefore, that for a group of relatively 
homogeneous items, those items for which this ratio is the largest 


would yield a test approaching the maximum validity. Items, how- 
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ever, differ in their means, standard deviations, and correlations with 
other items. By a more precise development of what happens when 
a single item j is removed from a test, it may be possible to estimate 
which items should be left in and which should be removed. Since 


dct oe oj Vert — Vje0j (4) 
Te(t-j) = — ’ 
Now:-; o4-4 





for ret-;) to be larger than rez 


TesOt — TjcOj 
ret. 
Ot-j 
Multiplying both sides of the inequality by o;-; and collecting terms, 
we have 
T1904 — Njceoj > TetOt-; » 
Tes (os — o4-j) > je; - 


If (o; — o;-;) is positive, 





Vj c0j 
< Tet ; (5) 
Oy — Ot-j 
while if (o; — o;-;) is negative, 
V jj 
> Tet ° (6) 
Of ~~ Ot-j 


In order to put the quantity (o; — o;-;) into more usable form, we 
note that 


oO’; = o74-j = o"t — (co; = 2rj:0;0% + o”;) = 2rjrojor —— o*;. (7) 
Since o*;— Oo" t-j = (oc; — or-j) (co + or-j), it follows that 
2rjp0j02 — 0°; 20; oj \ 


of — of-5 = po area Vit aes Cj. (8) 
Ct + Ot-j Ct sos COt-j Ct ae Ct-j 


o; + o-; very closely approximates 20; , so 








Or — Ot-j = (Tj — 0;/20:) 0; ° (9) 
Substituting (9) in (5) and (6) yields 
Tic 


<re, if rj-—0;/20; is positive; (10a) 
i aad a;/20% 
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Tic 

jt — 0;/204 

Thus we see that if the validity of the test is to be increased by 


>Pet, if rie — oj/20; is negative. (10b) 


T; 
dropping the item being considered, —_——*— for that item should 
Vj — 0;/2o% 
be less than the validity r.; if the denominator is positive, and great- 
er than this validity if the denominator is negative. 

For tests where the items are numerous and are scored as right 
or wrong and where the standard deviation of the total test is greater 
than 5, o;/2c; is .05 or less and can be ignored for the first approxi- 
mation of the new test form. Then the inequalities (10a) and (10b) 
reduce simply to 

Vie/Tit < Tee (Tit > 0) (11a) 


Vijc/V jt > Tet (rit i 0). (11b) 


Thus if the correlation of an item with the total test is positive 
and the item index 7;-/r;; is less than the validity of the test, the 
validity will be raised by dropping that item from the test; whereas 
if the correlation of the item with the total test is negative, the in- 
dex should be greater than the validity of the test if the item is to 
be removed. 

Actually we are interested in removing several items from a 
test at the same time. The above considerations would hold only 
when such changes have no effect on the remaining items. However, 
when several items are removed, the correlations of each item with 
the score on the nucleus of remaining items is changed somewhat. 
In practice, therefore, we would probably find that the removal of 
all items which do not meet the criteria indicated above will result 
in some items being removed which would now add to the validity 
of the selected items, while some which have been retained would 
no longer meet the recalculated criteria. For this reason the first 
selection should be only a trial selection, and it is necessary to recal- 
culate all of the item sub-total score correlations based on this new 
nucleus of items selected by the index, and compare the new indices 
with a recalculated test validity. 

At this point it becomes necessary to determine which items to 
add to a given nucleus of items in order to increase further the va- 
lidity of this nucleus taken as a test. Proceeding as above and with 
the same notation, let us determine when an item added to a test will 
increase the validity. 




















GOLDINE C. GLESER AND PHILIP H. DUBOIS 


Toros T+ jo; | 
Tony SP Lee (12) 
Ct+j 


T5005 > Tet (0143 — ot). 
Tjco 
jcj >t if (ot; — or) >0. (18) 


Tt+j — St 


By the same method used for equations (7) and (8), we find 


20; oj 
S345 — 6; — Tit =e Cj, 
ot + ots; or + ote; 





or 
Ct+j — Cj = (rj + a;/20;) 0; , (14) 
vr; ye 
eo a 7 oe eT (15a) 
Tj: + 0j/204 20% 
and 
vj “ 
meee ge Mwy + x D. (15b) 
Tie + o;/2ot 201 


Thus we see that an item not included in the first selection may 
now be selected if its correlation with the nucleus of selected items 
is such that it conforms to the inequalities (15a) or (15b), while 
items in the nucleus may now be discarded if they conform to in- 
equalities (10a) or (10b) where r,; in this case refers to the correla- 
tion of the selected items with the criterion. 

In light of the considerations stated above, there are two practi- 
cal methods for selecting items for a test. The method to be used 
depends on the degree of precision desired and the type of test ma- 
terial to which it is applied. 


Method I: 


This method is suitable for tests consisting of items which are 
fairly homogeneous as to subject matter for which one desires to 
choose quickly those items which will increase the validity for a 
particular criterion. The items must be of the type scored right or 
wrong. In this case use of the index alone will increase the validity 
substantially. The procedure is to: 

1) Obtain all of the item-criterion and item-total point-biserial 
correlations for each item. 


2) Calculate for each item its index 7;./7j;. 
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8) For items with positive item-total correlations select those 
items for which the index is greater than the validity of the original 
form of the test. 

4) If negative item-total correlations occur, then such items 
should be retained if the ratio 7;./7;: is less than the validity of the 


original test. 


Method II: 

This method is applicable to a more heterogeneous test or one 
for which it is desired to make a more exact selection. The proce- 
dure follows: 

1) Obtain a nucleus of items as in Method I or, considering 
only the item validities, pick out those items which have positive and 
substantial correlations with the criterion. A working rule might be 
to use items with validities significantly greater than zero. Since 
further selection will be made and criteria exist for either adding or 
removing items, it is possible to make the first selection by this alter- 
nate method and thus save the time involved in finding all of the 
original item-total score correlations. 

2) Score the papers on the basis of the items selected in Step 
(1) and obtain the correlations (7;,’s) of all the original items with 
the subtotal score and also the sub-test-criterion correlation, 7s: . 


3) Obtain the ratio wunapent- for each item; where the minus 
Vis = o;/ 20 
sign is used for those items included in the scoring (i.e., those se- 
lected in Step 1) and the plus sign is used for items not in the first 
selection. 

4) The revised test will now consist of those items having a 
positive denominator for which this ratio is larger than the validity 
1s; and those items for which the denominator of the index is nega- 
tive and the index is less than r,.. Some items in the first nucleus 
will probably be rejected, while other items not included in the first 
nucleus will now be accepted. 

5) Steps 2, 3, and 4 may be repeated after each new selection 
until no further changes occur. Usually a second or third adjustment 
is sufficient for practical purposes. Particularly where the correla- 
tions are obtained using some type of grouping of the scores, a 
change in one or two items will not affect the correlations, so that the 
second adjustment will give the desired result within the degree of 
accuracy of the correlations. 
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This method was used on the revised Object-Aperture Test (1) 
consisting originally of 64 items. The test as a whole had a validity 
of .53 with grades in General Engineering Drawing. Selection of 
items on the basis of Method I resulted in a choice of 39 items with 
a validity of .67. On the basis of a second and third adjustment as 
indicated above, three items were added and two removed, result- 
ing in the selection of 40 items with a correlation of .69 with the 
criterion. 


In dealing with questionnaires, personality inventories, and bio- 
graphical data blanks where the scoring for each item is determined 
empirically, the item validities will indicate both which items will 
enter into the initial nucleus and the direction of scoring of each 
item. Once the keying has been determined so that r;, 2 0 for all 
items, selection proceeds in the same fashion as indicated above. 

In order to illustrate how Method II works, the following hy- 
pothetical problem is presented here. Table 1 presents hypothetical 
scores for 20 people on ten items and the criterion score, the latter 
obtained by the following formula: 


c =item 1 + 2 times item 3 + item 5 + item 7 + item 9+ 2. 


The correlation between the “test” and “criterion” scores is .60. The 
second, third, and fourth columns of Table 2 show the item-criterion 
correlations and the item total-score correlations and also the ratio 
Tjc/Tjt for each item. It can be seen that the only items for which 
the ratio is above .60 are items 1, 3, 5, 7, and 9. These, of course, 
are the items which will give a maximum correlation with the “cri- 
terion.” Recomputing the validity r., for these five items gives .97. 
However, under the assumption that nothing was known as to the 
true source of the criterion, the new item-total correlations were 


r 

calculated and also the ratios nt ave for each item, as shown in 
Tis + Cj / 20; 

the last three columns. The new indices indicate that no further 

changes should be made. 
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TABLE 1 
Hypothetical Test Scores on Each Item and Criterion Scores 
Subject Items Total Criterion 
a: 2S SS. eS Score Score 
S, Re, ee ee ee a 9 8 
Ss, ae ae ee ee 8 8 
S, Ett Oe 2 Ey Ag 5 8 
S, m4. ae oe aes Be 7 7 
S; a a 9 7 
S, Pw Bo wee. 2 ae é{ 1 
S, 2d WO took aera ae 5 6 
S, DOD a tae Sy ed, adh ER oa ore 9 6 
S, OD SS 42> a. a ee. De 3 6 
Sio ee 2 Oe eS ee ae a ee ee 5 5 
Bus Doe oe Sa a a ee se. : 5 
Sie Dae ed a oe Ee 6 5 
Si; Pe AAS Ae Se tae 4 5 
Su. 2p @O hh 2) ae 2 DS 7 4 
5. [Oe Oe a a a ee 5 4 
Sie De: 2 2 Se .  ee 2 4 
Si; D4 a Se a ee ee 5 3 
Sis i oO. Oo OS ODO RO 8 * 1 3 
3.5 . ed ee SB a 5 3 
ee a ae a ee oe ee oe 4 2 
°; 48 .48 .50 .49 .49 .49 .50 .49 .50 .46 
N= 20., =C = 106 =TC— 646 
=T= 113 =C? = 626 Y.- = 60 
=T2 = 785 o, = 1.79 
o, =2.20° 
TABLE 2 
Item Correlations and Ratios for Selection of Items in Hypothetical Test 
Tic 
Item To; Tit Til jt Vis r;,+0,/20, 17;, + o;/20, 
Cut-off : .60 Cut-off: .97 
1 .23 36 64 283 hg 1.35 
2 .03 69 04 .08 23 13 
3 73 26 2.81 54 37 1.97 
4 —.26 .29 —.90 —.25 —.08 8.25* 
5 76 56 1.36 79 62 1.23 
6 —.82 38 | —84 —.25 —.08 4.00* 
7 54 26 2.08 63 46 1.17 
8 .02 -66 .08 .03 .20 10 
9 61 48 1.27 .68 51 1.20 
10 32 64 50 ol AT 68 





*Since Inequality (10b) Ay ae these items are rejected. 

hay § Index 1, items 1, 3, 5, 7, 9 are selected. With the test rescored on these 5 items 0s = 1.47, 
Te = 
Index 2 indicates no further selection is necessary. 
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BOOK REVIEW 


JOHN VON NEUMANN and OSKAR MORGENSTERN. Theory of Games and, Economic 
Behavior. (2nd Revised Edition) Princeton University Press, 1947, pp. xviii 
+ 641. 


A review of this book at this late date requires explanation. The reviewer is 
aware of the existence of a good many reviews, and he does not hesitate to admit 
that some of these reviews are good reviews. Good in that they distill much of the 
spirit of this 650-page opus in a mere 15-20 page paper of an expository nature 
(1), (2). As is understandable, most of the reviews were published in journals of 
economics or mathematics. However, the impact of this book transcends these 
fields: The problems it raises for scientists of any quantitative denomination are 
analogous to those raised for staff officers in the critique of maneuvers in which a 
new weapon has been tested. 


With this in mind let us confess without further ado that this is a difficult book. 
What makes it difficult is not what is expected from the reader in terms of mathe- 
matical background or familiarity with the facts and problems of economics. The 
book is difficult because it expresses new ideas and uses new unfamiliar tech- 
niques to buttress these ideas. The authors try to develop a general theory of 
economic behavior de novo. In their concern for rigor they proceed in a manner 
that makes it hard for them to give the reader, through examples, a feeling for 
the power and the beauty of the new approach. 

The authors—John von Neumann, a mathematician’s mathematician and Os- 
kar Morgenstern, a well known economist of the Austrian school—start out by 
stating their credo: Traditional mathematical economics has been unsuccessful 
because of the tools it has used. They were the tools of the differential calculus 
forged in the birth pangs of Newtonian Physics. Now the complexity of social 
phenomena is at least equal to the complexity of those encountered in Physics. 
“It is therefore to be expected—or feared—that mathematical discoveries of a 
stature comparable to that of calculus will be needed in order to produce decisive 
success in this field.”” The immediate task of social science (economics being the 
prototype of social science) is thus twofold: (1) continuation in the direction of 
the descriptive approach (“our knowledge of the relevant facts of economics is 
incomparably smaller than that commanded in physics at the time when the 
mathematization of that subject was achieved”) and (2) development of a math- 
ematical precision tool for a limited field. The authors’ scholarly modesty as well 
as their scientific and social philosophy are expressed in these sentences: “The 
great progress in every science came when, in the study of problems that were 
modest as compared with ultimate aims, methods were developed which could be 
extended further and further. . . .The sound procedure is to obtain first utmost 
precision and mastery in a limited field, and then to proceed to another, some- 
what wider one, and so on. This would do away with the unhealthy practice of 
applying so-called theories to economic or social reform where they are in no way 
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useful.” This is an important point for the methodology and strategy of a good 
many incipient scientific disciplines. 

The author’s advice is then briefly this: turn away from the “burning” ques- 
tions, concern with them merely delays progress. Find out as much as you can 
about the behavior of the individual and the simplest forms of exchange. Develop 
gradually a theory based on a “careful analysis of the ordinary everyday inter- 
pretation of economic facts.” This is a heuristic procedure: you are just groping 
your way from unmathematical plausibility considerations to a formal structure. 
Too bad! But this is the way to proceed if you want your final theory to be mathe- 
matically rigorous and conceptually general. In its first applications your results 
will appear trivial since they were never in doubt. But continue, work with more 
complicated problems until finally you will score your real successes when you can 
mathematically predict what will happen. 

The reviewer, who is not an economist, must confess here to a certain be- 
wilderment. He has no doubt that such a strategy has proved successful in the 
natural sciences where controlled experimentation led theory down a primrose 
path. He feels, however, less sanguine about the possibility of keeping “burning” 
or controversial issues out of the construction-job marked “theory of economics.” 
The economic facts of life seem much too interwoven with the behavior—rational 
or otherwise—of the individual and of society. The simple recipe of chanting “an 
economic fact is a fact is a fact’? may not constitute a powerful enough incanta- 
tion to dispel the next fellow’s or the next society’s plausibility considerations. 
The authors were undoubtedly aiming at scientific neutrality. Still at least one 
reviewer “doubts whether their method based essentially on a capitalist form of 
production covers all rational economics” (3). It is safe to say that the goals and 
needs of a society interact with the building of an economic theory in a complex 
manner.* The way in which the Theory of Games has caught on in the fields of 
economic and military strategy makes it relatively safe to state that this mathe- 
matical structure too is keeping its appointment with burning questions (4). 
These somewhat critical remarks do not detract from the intrinsic value of the 
book and from the real enjoyment that is felt by the serious reader willing to dig 
through a prose heavily loaded with footnotes and references to earlier sections.} 
The student will soon find himself fascinated by the ease with which combina- 
torics, set theory and linear algebra are developed under the very nose of unsus- 
pecting penny pitchers and poker players. 

The Theory of Games approaches economic theory from the viewpoint of 
the individual. It must therefore make certain assumptions concerning his mo- 
tives. The authors do not hesitate to accept the traditional view according to 


*Some of these interaction problems were considered in recent papers read 
before the Boston meetings of. the Institute for the Unity of Science. See in par- 
ticular the papers by Dr. A. Kaplan on “Scientific Method and Social Policy” and 
Prof. Philipp Frank on “The Logical and the Sociological Aspects of Science.” 
Dr. Kaplan was mainly concerned with the role of perspective, programmatic 
and methodic scientism while Prof. Frank attempted to analyze the extra-scien- 
tific factors responsible for the acceptance of a particular theory. 

+A sample from section 15.4.3 (page 119): “The interpretation which we are 
now going to give to the result of 13.5.3 is based on our considerations of 14.2-14.5 
—particuiarly those of 14.5.1., 14.5.2—and for this reason we could not propose 
it in 13.5.3. 
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which the consumer wants to obtain a maximum of satisfaction and the entrepre- 
neur a maximum of profits. Once maximization of utility has been stated as the 
principle of rational behavior a further assumption is necessary before we can 
manipulate the variable “utility” numerically (for simplicity’s sake we might for 
instance decide to use monetary units to measure utility). We must “accept the 
picture of an individual whose system of preferences is all-embracing and com- 
plete, ic., who for any two imagined events (or combination of events with 
stated probabilities) possesses a clear intuition of preference.” In their axiomatic 
treatment of utility von Neumann and Morgenstern combine this condition of a 
complete system of preferences with the condition of transitivity of preference 
relations into the single concept of complete ordering.* Our authors emphasize 
that they are dealing only with utilities experienced by one person with no im- 
plications concerning the comparison of utilities belonging to different individuals. 
Nobody should therefore expect to simply open the book in order to find weight- 
ing functions that would permit him to determine the utility function for a social 
group. In this connection von Neumann and Morgenstern state that the social 
maxim of “the greatest possible good for the greatest possible number” is self- 
contradictory, since “a guiding principle cannot be formulated by the requirement 
of maximizing two or more functions at once.” 

We are now almost ready to take a look at what constitutes a solution, i.e., a 
set of rules for a participant in an economic game. Since we cannot explore all 
possible types of games let us first see if we cannot categorize the types of pos- 
sible economic situations an individual might encounter. A brief summary of 
these categories resembles a primitive system of counting: one, two, many. In 
the Robinson Crusoe economy the mathematics is, theoretically, simple: there 
are a certain number of wants and a certain number of commodities and the 
problem is to obtain maximum satisfaction. Obviously an ordinary (though ad- 
mittedly multi-variable) maximum problem.—Now take the case of two partici- 
pants in a social exchange economy. This case turns out to have particular sig- 
nificance in the formulation of the whole theory. It has still certain elements in 
common with a maximum problem but certain radically new features have been 
added. Each participant attempts to maximize a function of which he does not 
control all the variables. 

As the number of players on the economic stage increases beyond two a new 
concept comes to the fore, the concept of coalition. By means of coalitions we can 
attempt to reduce a more complex exchange economy to what is essentially a two- 
person game, but this task is neither easy nor can it always be carried out con- 
vincingly. The hope still persists that, as in physics, it will some day be easier to 
deal statistically with an economy of 150 million people than with the problems 
involving exchanges between the butcher, the baker, and the candlestick maker. 
But the authors strongly insist that “only after the theory for moderate number 
of participants has been satisfactorily developed will it be possible to decide wheth- 
er extremely great numbers of participants simplify the situation.’’ They stress 


*The axiomatic treatment leaves utility a number determined up to a linear 
transformation. For a discussion of the relation of transformation groups to psy- 
chological scales see the Chapter by S. S. Stevens on “Mathematics, Measurement 
and Psychophysics” in the forthcoming Handbook of Experimental Psychology 
(S. S. Stevens, editor). 
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that the anology with the celestial vs. statistical mechanics situation tends to be 
fallacious. There the general theory of the mechanics of several bodies is well 
known. The difficulties that stem from the special, computational application of 
the theory to, let us say, the solar system are greater than those encountered in 
predicting the overall behavior of for instance 1025 freely moving particles, 

We have now seen under what circumstances the concept of utility can be 
handled numerically and we have some ideas concerning the types of economic 
situations we have to investigate. The aim of the inquiry is to find the mathemat- 
ically complete principles that define “rational behavior” for the participants in 
a social economy. While these principles ought to be perfectly general, it might 
be easier to start out by finding solutions for certain characteristic special cases. 
The next question is: how will we recognize a solution when we see one? Here is 
the intuitively plausible concept of a solution according to our authors: Each 
participant must have a set of rules telling him how to behave in every situation 
that may arise (in other words these rules make allowance for irrational behavior 
on the part of others). 

This is a point at which the great similarity between economics and the 
“everyday concept of games” is driven home. Games become now formally mathe- 
matical models for social and economic problems. They constitute ideal theoretical 
constructs: they are amenable to precise, exhaustive, and not too complicated defi- 
nitions; they further bear a resemblance to reality in the traits judged essential 
for the purposes at hand. The solution of the game which is derived from these 
constructs is in general an involved combinatorial catalogue. Its summary for the 
individual answers the question of how much he will get if he behaves “ration- 
ally.” This is the minimum he can get; if others make mistakes, he gets more. 

In particularly simple games the solution will consist of a single imputation, 
i.e., a single statement as to how the total proceeds are to be distributed among 
the participants. However, as soon as we get into more complicated games our 
solution undergoes parthenogenesis: the single imputation is replaced by a set of 
imputations: It turns out that this set of imputations is not ordered; in other 
words no single imputation is superior to (“dominates”) all others. To our au- 
thors this lack of transitivity is a most typical phenomenon in social organiza- 
tions. If the dominance relations between various states of society are of a cycli- 
cal nature (B is superior to A, C is superior to B and finally A is superior to C; 
compare this with the “paper form” of horses, or baseball teams if you wish), 
then we have not only different possible equilibrium positions but also a possibil- 
ity of passing from one of these equilibrium states to another. 

This brings us to the next important point which is the static nature of this 
theory. The authors are aware of the fact that a dynamic theory would be more 
complete and therefore preferable. Yet they feel that it is futile to try to build 
such a dynamic theory as long as the analysis of equilibrium states is not yet 
thoroughly understood. This static character of the theory is of course a very 
serious handicap if these models are to be used for the study of adaptive or learn- 
ing processes. 

Much of what we have discussed up to this point comes from the introductory 
chapter. Chapter 2 furnishes a general formal, set-theoretical description of 
games of strategy. Many of the important terms are defined in these 40 pages. 
The unifying concept of a player’s strategy emerges: it constitutes the plan of ac- 
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tion which specifies what choices he will make in every possible situation. 

This formal model is now put to work. If we leave Robinson Crusoe playing 
solitaire on his island, the simplest of the remaining games is the zero-sum two- 
person game. Zero-sum games are an important classification of all possible 
games. As the name implies the sum of all payments involved, by all players at 
the end of the game, is always zero. Two-person games are simple because of the 
absence of coalitions. Under these circumstances the main problems can ve form- 
ulated as follow: How does a player plan his strategy? How much information 
does he possess and what role does the amount of information play in determin- 
ing his moves? The zero-sum two-person game constitutes in other words a good 
dry run for the entire theory. 

Let us see what happens in this two-person game. Jim’s moves are deter- 
mined by the rules of the game and by his desire to win as much as possible. But 
he proceeds cautiously; he assumes that his strategy has already been found out 
by his opponent Joe. This is chviously the worst that could have happened to Jim. 
Jim chooses therefore a strategy that will assure him a gain that is not less 
than a certain amount (or a loss not greater than a certain amount). If Joe ac- 
tually is not as smart as Jim gives him credit for, Jim will be better off than 
he anticipates. 

The core of the zero-sum two-person game is constituted by the Min-Max 
(or Minimax) problem. With its help the authors show that games in which per- 
fect information prevails (like chess, for example) are particularly rational or 
strictly determined. For these games permanently optimal strategies exist. In this 
sense if the theory of chess were really fully known there would be nothing left to 
play. 

Then what about non-strictly determined games like Matching Pennies? Is 
there any hope that the “common-sense” behavior of players will yield a clue to a 
solution? Since it is hard to find out the intentions of your opponent, the next 
best thing in this game is to concentrate on avoiding having your own intentions 
found out. This can be done by playing a statistical or “mixed strategy” (play 
“tails” or “heads” with a probability mixture of 50:50) to protect yourself 
against loss. Our solution can therefore be couched in terms of mixed strategies. 

Before we go on to further theoretical considerations we get a chance to play 
a few elementary games like Stone, Paper, Scissors. We also watch Sherlock 
Holmes escape Professor Moriarty by helping him pick a good strategy. And as 
a special test we get initiated into the intricacies of a rather formalized stud 
poker. The emphasis is on bluffing. 

With the theory of the zero-sum two-person game as our base of operations 
we attack the zero-sum three-person game. The analysis is here dominated by the 
concept of coalitions: genesis, internal arrangements and understandings, strength 
and stability. From here we go on to the general treatment of the zero-sum n- 
person game, and finally to the most general type of game by removing the re- 
striction of the zero-sum. Here we leave the realm of games played for enter- 
tainment to enter the realm of economic reality since the sum of all payments, or 
the social product, is in general different from zero. 

A non-zero-sum game of n persons can be shown to be reducible to a zero- 
sum (n-+1)-person game by the introduction of a fictitious player, Mr. (n+1). 
He turns out to be a handy mathematical gadget and a sad character at the same 
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time. He obligingly permits us to make the sum.of the amounts obtained by the 
players equal to zero by picking up the total check. In order to be able to do this 
he must have no infiuence whatsoever on the course of the game and remain 
excluded from all transactions connected with the game. 

We have now run the whole economic gamut from Robinson Crusoe’s simple 
maximum problem to Mr. (n + 1)’s strange market place. In the remaining chap- 
ter von Neumann and Morgenstern discuss generalizations of their concepts of 
solution, domination and utility. 

The reviewer does not want to imply that all readers will get to these ulti- 
mate extensions of the theory. He feels that the reader will feel well rewarded if 
he works through the first five chapters, i.e., up to and including the zero-sum 
three-person game. Most of the fundamental ideas are developed in these 240 
pages, including the classic Minimax problem and the treatment of coalitions. He 
may by that time be able to think of some relevant applications of the theory 
which has unfolded before him or he may have gained enough courage to strike 
out in new directions. 

The reviewer does not feel competent to evaluate the promise the book holds 
for the future, nor to prescribe it as a wonder drug to those who are dealing with 
difficult quantitative problems in this area. He feels, however, that the effect of 
the von Neumann-Morgenstern opus will not be that of strong medicine but 
rather that of a beneficial catalyst in the thinking of social scientists. 
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A new issue of Psychometric Monograph Series, No. 5, The De- 
scription of Aptitude and Achievement Tests in Terms of Rotated 
Factors, by John W. French, will be ready for distribution later this 
year. The issue will contain about 300 pages and the price will be 
$4.00. 


Previously published issues of the Psychometric Monograph Se- 
ries are as follows: 


Thurstone, L. L. Primary mental abilities. Psychometric Mono- 
graph No. 1, $2.00. 


Thurstone, L. L. and Thurstone, Thelma Gwinn. Factorial stud- 
ies of intelligence. Psychometric Monograph No. 2, $1.50. 


Wolfle, Dael. Factor analysis to 1940. Psychometric Monograph 
No. 3, $1.25. 


Thurstone, L. L. A factorial study of perception. Psychometric 
Monograph No. 4, $2.50. 


Orders for any issue should be sent to: 


THE UNIVERSITY OF CHICAGO PRESS 
5750 Ellis Avenue 
Chicago, Illinois 


The Psychometric Monograph Committee is composed of J. P. 
Guilford, Chairman; L. L. Thurstone, Harold Gulliksen, Paul Horst, 
and Frederic Kuder. Manuscripts and correspondence for this series 
should be addressed to: 


J. P. GUILFORD, Chairman 
Psychometric Monograph Committee 
Box 1134 

Beverly Hills, California 
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